Use of findall and parenthesis in Python

浪尽此生 提交于 2021-02-07 07:31:12

问题


I need to extract all letters after the + sign or at the beginning of a string like this:

formula = "X+BC+DAF"

I tried so, and I do not want to see the + sign in the result. I wish see only ['X', 'B', 'D'].

>>> re.findall("^[A-Z]|[+][A-Z]", formula)
['X', '+B', '+D']

When I grouped with parenthesis, I got this strange result:

re.findall("^([A-Z])|[+]([A-Z])", formula)
[('X', ''), ('', 'B'), ('', 'D')]

Why it created tuples when I try to group ? How to write the regexp directly such that it returns ['X', 'B', 'D'] ?


回答1:


If there are any capturing groups in the regular expression then re.findall returns only the values captured by the groups. If there are no groups the entire matched string is returned.

re.findall(pattern, string, flags=0)

Return all non-overlapping matches of pattern in string, as a list of strings. The string is scanned left-to-right, and matches are returned in the order found. If one or more groups are present in the pattern, return a list of groups; this will be a list of tuples if the pattern has more than one group. Empty matches are included in the result unless they touch the beginning of another match.


How to write the regexp directly such that it returns ['X', 'B', 'D'] ?

Instead of using a capturing group you can use a non-capturing group:

>>> re.findall(r"(?:^|\+)([A-Z])", formula)
['X', 'B', 'D']

Or for this specific case you could try a simpler solution using a word boundary:

>>> re.findall(r"\b[A-Z]", formula)
['X', 'B', 'D']

Or a solution using str.split that doesn't use regular expressions:

>>> [s[0] for s in formula.split('+')]
['X', 'B', 'D']


来源:https://stackoverflow.com/questions/13840883/use-of-findall-and-parenthesis-in-python

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!