问题
I am trying to use lookbehinds in a regular expression and it doesn't seem to work as I expected. So, this is not my real usage, but to simplify I will put an example. Imagine I want to match "example" on a string that says "this is an example". So, according to my understanding of lookbehinds this should work:
(?<=this\sis\san\s*?)example
What this should do is find "this is an", then space characters and finally match the word "example". Now, it doesn't work and I don't understand why, is it impossible to use '+' or '*' inside lookbehinds?
I also tried those two and they work correctly, but don't fulfill my needs:
(?<=this\sis\san\s)example
this\sis\san\s*?example
I am using this site to test my regular expressions: http://gskinner.com/RegExr/
回答1:
Many regular expression libraries do only allow strict expressions to be used in look behind assertions like:
- only match strings of the same fixed length:
(?<=foo|bar|\s,\s)
(three characters each) - only match strings of fixed lengths:
(?<=foobar|\r\n)
(each branch with fixed length) - only match strings with a upper bound length:
(?<=\s{,4})
(up to four repetitions)
The reason for these limitations are mainly because those libraries can’t process regular expressions backwards at all or only a limited subset.
Another reason could be to avoid authors to build too complex regular expressions that are heavy to process as they have a so called pathological behavior (see also ReDoS).
See also section about limitations of look-behind assertions on Regular-Expressions.info.
回答2:
Hey if your not using python variable look behind assertion you can trick the regex engine by escaping the match and starting over by using \K
.
This site explains it well .. http://www.phpfreaks.com/blog/pcre-regex-spotlight-k ..
But pretty much when you have an expression that you match and you want to get everything behind it using \K will force it to start over again...
Example:
string = '<a this is a tag> with some information <div this is another tag > LOOK FOR ME </div>'
matching /(\<a).+?(\<div).+?(\>)\K.+?(?=\<div)/
will cause the regex to restart after you match the ending div
tag so the regex won't include that in the result. The (?=\div)
will make the engine get everything in front of ending div tag
回答3:
What Amber said is true, but you can work around it with another approach: A non-capturing parentheses group
(?<=this\sis\san)(?:\s*)example
That make it a fixed length look behind, so it should work.
回答4:
Most regex engines don't support variable-length expressions for lookbehind assertions.
回答5:
You can use sub-expressions.
(this\sis\san\s*?)(example)
So to retrieve group 2, "example", $2
for regex, or \2
if you're using a format string (like for python's re.sub
)
来源:https://stackoverflow.com/questions/9030305/regular-expression-lookbehind-doesnt-work-with-quantifiers-or