可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效,请关闭广告屏蔽插件后再试):
问题:
In my snippet below, the non-capturing group "(?:aaa)"
should be ignored in matching result, so the result should be "_bbb"
only.
However, I get "aaa_bbb"
in matching result; only when I specify group(2) does it show "_bbb"
.
import re string1 = "aaa_bbb" print(re.match(r"(?:aaa)(_bbb)", string1).group()) >>> aaa_bbb
回答1:
group()
and group(0)
will return the entire match. Subsequent groups are actual capture groups.
>>> print (re.match(r"(?:aaa)(_bbb)", string1).group(0)) aaa_bbb >>> print (re.match(r"(?:aaa)(_bbb)", string1).group(1)) _bbb >>> print (re.match(r"(?:aaa)(_bbb)", string1).group(2)) Traceback (most recent call last): File "", line 1, in ? IndexError: no such group
回答2:
I think you're misunderstanding the concept of a "non-capturing group". The text matched by a non-capturing group still becomes part of the overall regex match.
Both the regex (?:aaa)(_bbb)
and the regex (aaa)(_bbb)
return aaa_bbb
as the overall match. The difference is that the first regex has one capturing group which returns _bbb
as its match, while the second regex has two capturing groups that return aaa
and _bbb
as their respective matches. In your Python code, to get _bbb
, you'd need to use group(1)
with the first regex, and group(2)
with the second regex.
The main benefit of non-capturing groups is that you can add them to a regex without upsetting the numbering of the capturing groups in the regex. They also offer (slightly) better performance as the regex engine doesn't have to keep track of the text matched by non-capturing groups.
If you really want to exclude aaa
from the overall regex match then you need to use lookaround. In this case, positive lookbehind does the trick: (?. With this regex, group()
returns _bbb
in Python. No capturing groups needed.
My recommendation is that if you have the ability to use capturing groups to get part of the regex match, use that method instead of lookaround.
回答3:
TFM:
class re.MatchObject
group([group1, ...])
Returns one or more subgroups of the match. If there is a single argument, the result is a single string; if there are multiple arguments, the result is a tuple with one item per argument. Without arguments, group1 defaults to zero (the whole match is returned). If a groupN argument is zero, the corresponding return value is the entire matching string.
回答4:
Try:
print(re.match(r"(?:aaa)(_bbb)", string1).group(1))
group()
is same as group(0)
and Group 0
is always present and it's the whole RE match.
回答5:
You have to specify group(1)
to get just the part captured by the parenthesis (_bbb
in this case).
group()
without parameters will return the whole string the complete regular expression matched, no matter if some parts of it were additionally captured by parenthesis or not.
回答6:
Use the groups method on the match object instead of group. It returns a list of all capture buffers. The group method with no argument is returning the entire match of the regular expression.