Non-greedy in Python Regex

穿精又带淫゛_ 提交于 2019-12-24 13:24:38

问题


I try to understand the non-greedy regex in python, but I don't understand why the following examples have this results:

print(re.search('a??b','aaab').group())
ab
print(re.search('a*?b','aaab').group())
aaab

I thought it would be 'b' for the first and 'ab' for the second. Can anyone explain that?


回答1:


This happens because the matches you are asking match afterwards. If you try to follow how the matching for a??b happens from left to right you'll see something like this:

  • Try 0 a plus b vs aaab: no match (b != a)
  • Try 1 a plus b vs aaab : no match (ab != aa)
  • Try 0 a plus b vs aab: no match (b != a) (match position moved to the right by one)
  • Try 1 a plus b vs aab : no match (ab != aa)
  • Try 0 a plus b vs ab: no match (b != a) (match position moved to the right by one)
  • Try 1 a plus b vs ab : match (ab == ab)

Similarly for *?.

The fact is that the search function returns the leftmost match. Using ?? and *? changes only the behaviour to prefer the shortest leftmost match but it will not return a shorter match that starts at the right of an already found match.

Also note that the re module doesn't return overlapping matches, so even using findall or finditer you will not be able to find the two matches you are looking for.




回答2:


Its because of that ?? is lazy while ? is greedy.and a lazy quantifier will match zero or one (its left token), zero if that still allows the overall pattern to match.for example all the following will returns an empty string :

>>> print(re.search('a??','a').group())

>>> print(re.search('a??','aa').group())

>>> print(re.search('a??','aaaa').group())

And the regex a??b will match ab or b :

>>> print(re.search('a??b','aaab').group())
ab
>>> print(re.search('a??b','aacb').group())
b

And if it doesn't allows the overall pattern to match and there was not any b it will return None :

>>> print(re.search('a??b','aac').group())
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
AttributeError: 'NoneType' object has no attribute 'group'

And about the second part you have a none-greedy regex and the result is very obvious.It will match any number of a and then b:

print(re.search('a*?b','aaab').group())
aaab



回答3:


Explanation for the Pattern - /a??b/

a?? matches the character a literally (case sensitive), Then the quantifier ?? means Between zero and one time, as few times as possible, expanding as needed [lazy], then character b should match, literally (case sensitive)

So It will match last 'ab' characters in the given string 'aaab'

And For Pattern - /a*?b/

a*? matches the character 'a' literally (case sensitive) Here the Quantifier *? means between zero and unlimited times, as few times as possible, expanding as needed [lazy] then character b should match, literally (case sensitive).

So It will match 'aaab' as a whole in 'aaab'



来源:https://stackoverflow.com/questions/31357459/non-greedy-in-python-regex

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!