PCRE: backreferences not allowed in lookbehinds?

雨燕双飞 提交于 2019-12-19 17:41:18

问题


The PCRE regex /..(?<=(.)\1)/ fails to compile: "Subpattern references are not allowed within a lookbehind assertion." Interestingly it seems to be acceptable in lookaheads, like /(?=(.)\1)../, just not in lookbehinds.

Is there a technical reason why backreferences are not allowed in lookbehinds specifically?


回答1:


With Python's re module, group references are not supported in lookbehind, even if they match strings of some fixed length.


Lookbehinds doesn't fully support PCRE rules. Concretely, when the regex engine reaches a lookbehind it'll try to determine it size, and then jump back to check the match.

This size determination brings you to a choice:

  • allow variable size, then every lookbehind needs to be executed before to jump back
  • disallow variable size, then we can directly jump back

As the first solution would be the best for us (users), it's obviously the slowest, and the hardest to develop. And so for PCRE regex, they resolved to use the second solution. The Java regex engine, for another example, allows semi-variable lookbehinds: you only need to determine the maximum size.


I came to PCRE and Python's re module.
I've not found anything else in PCRE documentation than this error code:

COMPILATION ERROR CODES
25: lookbehind assertion is not fixed length

But in this case, the lookbehind assertion is fixed length.
Now, here is what we can find in re documentation:

The contained pattern must only match strings of some fixed length, meaning that abc or a|b are allowed, but a* and a{3,4} are not. Group references are not supported even if they match strings of some fixed length.

We've got our guilty... If you want, you can try the Python's regex module , which seems to support variable length lookbehind.



来源:https://stackoverflow.com/questions/30678150/pcre-backreferences-not-allowed-in-lookbehinds

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!