Empty string instead of unmatched group error

后端 未结 3 871
萌比男神i
萌比男神i 2020-11-29 11:21

I have this piece of code:

for n in (range(1,10)):
    new = re.sub(r\'(regex(group)regex)?regex\', r\'something\'+str(n)+r\'\\1\', old, count=1)
         


        
3条回答
  •  轻奢々
    轻奢々 (楼主)
    2020-11-29 12:21

    Root cause

    Before Python 3.5, backreferences to failed capture groups in Python re.sub were not populated with an empty string. Here is Bug 1519638 description at bugs.python.org. Thus, when using a backreference to a group that did not participate in the match resulted in an error.

    There are two ways to fix that issue.

    Solution 1: Adding empty alternatives to make optional groups obligatory

    You can replace all optional capturing groups (those constructs like (\d+)?) with obligatory ones with an empty alternative (i.e. (\d+|)).

    Here is an example of the failure:

    import re
    old = 'regexregex'
    new = re.sub(r'regex(group)?regex', r'something\1something', old)
    print(new)
    

    Replacing one line with

    new = re.sub(r'regex(group|)regex', r'something\1something', old)
    

    It works.

    Solution 2: Using lambda expression in the replacement and checking if the group is not None

    This approach is necessary if you have optional groups inside another optional group.

    You can use a lambda in the replacement part to check if the group is initialized, not None, with lambda m: m.group(n) or ''. Use this solution in your case, because you have two backreferences - #3 and #4 - in the replacement pattern, but some matches (see Match 1 and 3) do not have Capture group 3 initialized. It happens because the whole first part - (\s*\{{2}funcA(ka|)\s*\|\s*([^}]*)\s*\}{2}\s*|) - is not participating in the match, and the inner Capture group 3 (i.e. ([^}]*)) just does not get populated even after adding an empty alternative.

    re.sub(r'(?i)(\s*\{{2}funcA(ka|)\s*\|\s*([^\}]*)\s*\}{2}\s*|)\{{2}funcB\s*\|\s*([^\}]*)\s*\}{2}\s*', 
    r"\n | funcA"+str(n)+r" = \3\n | funcB"+str(n)+r" = \4\n | string"+str(n)+r" = \n", 
    text, 
    count=1)
    

    should be re-written with

    re.sub(r'(?i)(\s*{{funcA(ka|)\s*\|\s*([^}]*)\s*}}\s*|){{funcB\s*\|\s*([^}]*)\s*}}\s*', 
    lambda m: r"\n | funcA"+str(n)+r" = " + (m.group(3) or '') + "\n | funcB" + str(n) + r" = " + (m.group(4) or '') + "\n | string" + str(n) + r" = \n", 
    text, 
    count=1)  
    

    See IDEONE demo

    import re
     
    text = r'''
     
    {{funcB|param1}}
    *some string*
    {{funcA|param2}}
    {{funcB|param3}}
    *some string2*
     
    {{funcB|param4}}
    *some string3*
    {{funcAka|param5}}
    {{funcB|param6}}
    *some string4*
    '''
     
    for n in (range(1,(text.count('funcB')+1))):
        text = re.sub(r'(?i)(\s*\{{2}funcA(ka|)\s*\|\s*([^\}]*)\s*\}{2}\s*|)\{{2}funcB\s*\|\s*([^\}]*)\s*\}{2}\s*', 
        lambda m: r"\n | funcA"+str(n)+r" = "+(m.group(3) or '')+"\n | funcB"+str(n)+r" = "+(m.group(4) or '')+"\n | string"+str(n)+r" = \n", 
        text, 
        count=1) 
        
    assert text == r'''
    | funcA1 =
    | funcB1 = param1
    | string1 =
    *some string*
    | funcA2 = param2
    | funcB2 = param3
    | string2 =
    *some string2*
    | funcA3 =
    | funcB3 = param4
    | string3 =
    *some string3*
    | funcA4 = param5
    | funcB4 = param6
    | string4 =
    *some string4*
    '''
    print 'ok'

提交回复
热议问题