How do capture groups work? (wrt python regular expressions)

微笑、不失礼 提交于 2021-01-27 18:19:53

问题


While using regex to help solve a problem in the Python Challenge, I came across some behaviour that confused me.

from here:

(...) Matches whatever regular expression is inside the parentheses.

and

'+' Causes the resulting RE to match 1 or more repetitions of the preceding RE.

So this makes sense:

>>>import re
>>>re.findall(r"(\d+)", "1111112")
['1111112']

But this doesn't:

>>> re.findall(r"(\d)+", "1111112")
['2']

I realise that findall returns only groups when groups are present in the regex, but why is only the '2' returned? What happends to all the 1's in the match?


回答1:


Because you only have one capturing group, but it's "run" repeatedly, the new matches are repeatedly entered into the "storage space" for that group. In other words, the 1s were lost when they were "overwritten" by subsequent 1s and eventually the 2.




回答2:


You are repeating the group itself by appending '+' after ')', I do not know the implementation details but it matches 7 times, and returns only the last match.

In the first one, you are matching 7 digits, and making it a group.



来源:https://stackoverflow.com/questions/861060/how-do-capture-groups-work-wrt-python-regular-expressions

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!