Positive lookbehind vs non-capturing group: different behaviuor

安稳与你 提交于 2020-02-27 07:35:23

问题


I use python regular expressions (re module) in my code and noticed different behaviour in theese cases:

re.findall(r'\s*(?:[a-z]\))?[^.)]+', 'a) xyz. b) abc.') # non-capturing group
# results in ['a) xyz', ' b) abc']

and

re.findall(r'\s*(?<=[a-z]\))?[^.)]+', 'a) xyz. b) abc.') # lookbehind
# results in ['a', ' xyz', ' b', ' abc']

What I need to get is just ['xyz', 'abc']. Why are the examples behave differently and how t get the desired result?


回答1:


The reason a and b are included in the second case is because (?<=[a-z]\)) would first find a) and since lookaround's don't consume any character you are back at the start of string.Now [^.)]+ matches a

Now you are at ).Since you have made (?<=[a-z]\)) optional [^.)]+ matches xyz

This same thing is repeated with b) abc

remove ? from the second case and you would get the expected result i.e ['xyz', 'abc']




回答2:


The regex you are looking for is:

re.findall(r'(?<=[a-z]\) )[^) .]+', 'a) xyz. b) abc.')

I believe the currently accepted answer by Anirudha explains the differences between your use of positive lookbehind and non-capturing well, however, the suggestion of removing the ? from after the positive lookbehind actually results in [' xyz', ' abc'] (note the included spaces).

This is due to the positive lookbehind not matching the space character as well as not including space in the main matching character class itself.



来源:https://stackoverflow.com/questions/14692395/positive-lookbehind-vs-non-capturing-group-different-behaviuor

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!