python re.findall() with substring in alternations

那年仲夏 提交于 2021-02-17 05:45:19

问题


If I have a substring (or 'subpattern') of another string or pattern in a regex alternation, like so:

r'abcd|bc'

What is the expected behaviour of re.compile(r'abcd|bc').findall('abcd bcd bc ab')?

Trying it out, I get (as expected)

['abcd', 'bc', 'bc']

so I thought re.compile(r'bc|abcd').findall('abcd bcd bc ab') might yield ['bc', 'bc', 'bc'] but instead it again returns

['abcd', 'bc', 'bc']

Can someone explain this? I was under the impression that findall would greedily return matches but apparently, it backtracks and tries to match alternate patterns what would yield longer tokens.


回答1:


No backtracking takes place at all. Your pattern matches two different types of strings; | means or. Each pattern is tried out at each position.

So when the expression finds abcd at the start of your input, that text matches your pattern just fine, it fits the abcd part of the (bc or abcd) pattern you gave it.

Ordering of the alternative parts doesn't play here, as far as the regular expression engine is concerned, abcd|bc is the same thing as bc|abcd. abcd is not disregarded just because bc might match later on in the string.



来源:https://stackoverflow.com/questions/20647431/python-re-findall-with-substring-in-alternations

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!