Variable-Width Lookbehind Issue in Python

点点圈 提交于 2020-01-11 01:26:28

问题


I got the following scenarios:

1) car on the right shoulder
2) car on the left shoulder
3) car on the shoulder

I want to match "shoulder" when left|right is not present. So only 3) return "shoulder"

re.compile(r'(?<!right|right\s*)shoulder')
sre_constants.error: look-behind requires fixed-width pattern

It seems like I can't use \s* and "|"

How can I solve this.

Thanks in advance!


回答1:


regex module: variable-width lookbehind

In addition to the answer by HamZa, for any regex of any complexity in Python, I recommend using the outstanding regex module by Matthew Barnett. It supports infinite looknehind—one of the few engines to do so, along with .NET and JGSoft.

This allows you to do for instance:

import regex
if regex.search("(?<!right |left )shoulder", "left shoulder"):
    print("It matches!")
else:
    print("Nah... No match.")

You could also use \s+ if you wished.

Output:

It matches!



回答2:


In most regex engines, lookbehinds needs to be of fixed width. This means you can't use quantifiers in a lookbehind in Python +*?. The solution is to move \s* outside your lookbehind:

(?<!left|right)\s*shoulder

You will notice that this expression matches every combination. So we need to change the quantifier from * to +:

(?<!left|right)\s+shoulder

The only problem with this solution is that it won't find shoulder if it's at the beginning of the string, so we might add an alternative with an anchor:

^shoulder|(?<!left|right)\s+shoulder

If you want to get rid of the whitespaces just use the strip function.

Online demo



来源:https://stackoverflow.com/questions/24987403/variable-width-lookbehind-issue-in-python

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!