I have a file that includes a bunch of strings like \"size=XXX;\". I am trying python\'s re module for the first time and am a bit mystified by the following behavior: if I
When a regular expression contains parentheses, they capture their contents to groups, changing the behaviour of findall()
to only return those groups. Here's the relevant section from the docs:
(...)
Matches whatever regular expression is inside the parentheses, and indicates the start and end of a group; the contents of a group can be retrieved after a match has been performed, and can be matched later in the string with the
\number
special sequence, described below. To match the literals'('
or')'
, use\(
or\)
, or enclose them inside a character class:[(] [)]
.
To avoid this behaviour, you can use a non-capturing group:
>>> print re.findall(r'size=(?:50|51);',myfile)
['size=51;', 'size=51;', 'size=51;', 'size=50;', 'size=50;', 'size=50;', 'size=50;']
Again, from the docs:
(?:...)
A non-capturing version of regular parentheses. Matches whatever regular expression is inside the parentheses, but the substring matched by the group cannot be retrieved after performing a match or referenced later in the pattern.