I have this piece of code:
for n in (range(1,10)):
new = re.sub(r\'(regex(group)regex)?regex\', r\'something\'+str(n)+r\'\\1\', old, count=1)
Before Python 3.5, backreferences to failed capture groups in Python re.sub
were not populated with an empty string. Here is Bug 1519638 description at bugs.python.org. Thus, when using a backreference to a group that did not participate in the match resulted in an error.
There are two ways to fix that issue.
You can replace all optional capturing groups (those constructs like (\d+)?
) with obligatory ones with an empty alternative (i.e. (\d+|)
).
Here is an example of the failure:
import re
old = 'regexregex'
new = re.sub(r'regex(group)?regex', r'something\1something', old)
print(new)
Replacing one line with
new = re.sub(r'regex(group|)regex', r'something\1something', old)
It works.
None
This approach is necessary if you have optional groups inside another optional group.
You can use a lambda in the replacement part to check if the group is initialized, not None
, with lambda m: m.group(n) or ''
. Use this solution in your case, because you have two backreferences - #3 and #4 - in the replacement pattern, but some matches (see Match 1 and 3) do not have Capture group 3 initialized. It happens because the whole first part - (\s*\{{2}funcA(ka|)\s*\|\s*([^}]*)\s*\}{2}\s*|)
- is not participating in the match, and the inner Capture group 3 (i.e. ([^}]*)
) just does not get populated even after adding an empty alternative.
re.sub(r'(?i)(\s*\{{2}funcA(ka|)\s*\|\s*([^\}]*)\s*\}{2}\s*|)\{{2}funcB\s*\|\s*([^\}]*)\s*\}{2}\s*',
r"\n | funcA"+str(n)+r" = \3\n | funcB"+str(n)+r" = \4\n | string"+str(n)+r" = \n",
text,
count=1)
should be re-written with
re.sub(r'(?i)(\s*{{funcA(ka|)\s*\|\s*([^}]*)\s*}}\s*|){{funcB\s*\|\s*([^}]*)\s*}}\s*',
lambda m: r"\n | funcA"+str(n)+r" = " + (m.group(3) or '') + "\n | funcB" + str(n) + r" = " + (m.group(4) or '') + "\n | string" + str(n) + r" = \n",
text,
count=1)
See IDEONE demo
import re
text = r'''
{{funcB|param1}}
*some string*
{{funcA|param2}}
{{funcB|param3}}
*some string2*
{{funcB|param4}}
*some string3*
{{funcAka|param5}}
{{funcB|param6}}
*some string4*
'''
for n in (range(1,(text.count('funcB')+1))):
text = re.sub(r'(?i)(\s*\{{2}funcA(ka|)\s*\|\s*([^\}]*)\s*\}{2}\s*|)\{{2}funcB\s*\|\s*([^\}]*)\s*\}{2}\s*',
lambda m: r"\n | funcA"+str(n)+r" = "+(m.group(3) or '')+"\n | funcB"+str(n)+r" = "+(m.group(4) or '')+"\n | string"+str(n)+r" = \n",
text,
count=1)
assert text == r'''
| funcA1 =
| funcB1 = param1
| string1 =
*some string*
| funcA2 = param2
| funcB2 = param3
| string2 =
*some string2*
| funcA3 =
| funcB3 = param4
| string3 =
*some string3*
| funcA4 = param5
| funcB4 = param6
| string4 =
*some string4*
'''
print 'ok'