Regex negative lookbehind not valid in JavaScript [duplicate]

不问归期 提交于 2019-11-26 08:33:53

问题


Consider:

var re = /(?<=foo)bar/gi;

It is an invalid regular expression in Plunker. Why?


回答1:


JavaScript lacks support for lookbehinds like (?<=…) (positive) and (?<!…) (negative), but that doesn't mean you can't still implement this sort of logic in JavaScript.

Matching (not global)

Positive lookbehind match:

// from /(?<=foo)bar/i
var matcher = mystring.match( /foo(bar)/i );
if (matcher) {
  // do stuff with matcher[1] which is the part that matches "bar"
}

Fixed width negative lookbehind match:

// from /(?<!foo)bar/i
var matcher = mystring.match( /(?!foo)(?:^.{0,2}|.{3})(bar)/i );
if (matcher) {
  // do stuff with matcher[1] ("bar"), knowing that it does not follow "foo"
}

Negative lookbehinds can be done without the global flag, but only with a fixed width, and you have to calculate that width (which can get difficult with alternations). Using (?!foo).{3}(bar) would be simpler and roughly equivalent, but it won't match a line starting with "rebar" since . can't match newlines, so we need the above code's alternation to match lines featuring "bar" before character four.

If you need it with a variable width, use the below global solution and put a break at the end of the if stanza. (This limitation is quite common. .NET, vim, and JGsoft are the only regex engines that support variable width lookbehind. PCRE, PHP, and Perl are limited to fixed width. Python requires an alternate regex module to support this. That said, the logic to the workaround below should work for all languages that support regex.)

Matching (global)

When you need to loop on each match in a given string (the g modifier, global matching), you have to redefine the matcher variable in each loop iteration and you must use RegExp.exec() (with the RegExp created before the loop) because String.match() interprets the global modifier differently and will create an infinite loop!

Global positive lookbehind:

var re = /foo(bar)/gi;  // from /(?<=foo)bar/gi
while ( matcher = re.exec(mystring) ) {
  // do stuff with matcher[1] which is the part that matches "bar"
}

"Stuff" may of course include populating an array for further use.

Global Negative lookbehind:

var re = /(foo)?bar/gi;  // from /(?<!foo)bar/gi
while ( matcher = re.exec(mystring) ) {
  if (!matcher[1]) {
    // do stuff with matcher[0] ("bar"), knowing that it does not follow "foo"
  }
}

Note that there are cases in which this will not fully represent the negative lookbehind. Consider /(?<!ba)ll/g matching against Fall ball bill balll llama. It will find only three of the desired four matches because when it parses balll, it finds ball and then continues one character late at l llama. This only occurs when a partial match at the end could interfere with a partial match at a different end (balll breaks (ba)?ll but foobarbar is fine with (foo)?bar) The only solution to this is to use the above fixed width method.

Replacing

There's a wonderful article called Mimicking Lookbehind in JavaScript that describes how to do this.
It even has a follow-up that points to a collection of short functions that implement this in JS.

Implementing lookbehind in String.replace() is much easier since you can create an anonymous function as the replacement and handle the lookbehind logic in that function.

These work on the first match but can be made global by merely adding the g modifier.

Positive lookbehind replacement:

// assuming you wanted mystring.replace(/(?<=foo)bar/i, "baz"):
mystring = mystring.replace( /(foo)?bar/i,
  function ($0, $1) { return ($1 ? $1 + "baz" : $0) }
);

This takes the target string and replaces instances of bar with baz so long as they follow foo. If they do, $1 is matched and the ternary operator (?:) returns the matched text and the replacement text (but not the bar part). Otherwise, the ternary operator returns the original text.

Negative lookbehind replacement:

// assuming you wanted mystring.replace(/(?<!foo)bar/i, "baz"):
mystring = mystring.replace( /(foo)?bar/i,
  function ($0, $1) { return ($1 ? $0 : "baz") }
);

This is essentially the same, but since it's a negative lookbehind, it acts when $1 is missing (we don't need to say $1 + "baz" here because we know $1 is empty).

This has the same caveat as the other dynamic-width negative lookbehind workaround and is similarly fixed by using the fixed width method.




回答2:


Here is a way to parse HTML string using DOM in JS and perform replacements only outside of tags:

var s = '<span class="css">55</span> 2 >= 1 2 > 1';
var doc = document.createDocumentFragment();
var wrapper = document.createElement('myelt');
wrapper.innerHTML = s;
doc.appendChild( wrapper );

function textNodesUnder(el){
  var n, walk=document.createTreeWalker(el,NodeFilter.SHOW_TEXT,null,false);
  while(n=walk.nextNode())
  {
       if (n.parentNode.nodeName.toLowerCase() === 'myelt')
      		n.nodeValue =  n.nodeValue.replace(/>=?/g, "EQUAL"); 
  }
  return el.firstChild.innerHTML;
} 
var res = textNodesUnder(doc);
console.log(res);
alert(res);


来源:https://stackoverflow.com/questions/35142364/regex-negative-lookbehind-not-valid-in-javascript

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!