问题
If I have a substring (or 'subpattern') of another string or pattern in a regex alternation, like so:
r'abcd|bc'
What is the expected behaviour of re.compile(r'abcd|bc').findall('abcd bcd bc ab')
?
Trying it out, I get (as expected)
['abcd', 'bc', 'bc']
so I thought re.compile(r'bc|abcd').findall('abcd bcd bc ab')
might yield ['bc', 'bc', 'bc']
but instead it again returns
['abcd', 'bc', 'bc']
Can someone explain this? I was under the impression that findall
would greedily return matches but apparently, it backtracks and tries to match alternate patterns what would yield longer tokens.
回答1:
No backtracking takes place at all. Your pattern matches two different types of strings; |
means or. Each pattern is tried out at each position.
So when the expression finds abcd
at the start of your input, that text matches your pattern just fine, it fits the abcd
part of the (bc
or abcd
) pattern you gave it.
Ordering of the alternative parts doesn't play here, as far as the regular expression engine is concerned, abcd|bc
is the same thing as bc|abcd
. abcd
is not disregarded just because bc
might match later on in the string.
来源:https://stackoverflow.com/questions/20647431/python-re-findall-with-substring-in-alternations