Empty string instead of unmatched group error

后端 未结 3 889
萌比男神i
萌比男神i 2020-11-29 11:21

I have this piece of code:

for n in (range(1,10)):
    new = re.sub(r\'(regex(group)regex)?regex\', r\'something\'+str(n)+r\'\\1\', old, count=1)
         


        
3条回答
  •  予麋鹿
    予麋鹿 (楼主)
    2020-11-29 12:18

    To simplify:

    Problem

    1. You are getting the error "sre_constants.error: unmatched group" from a Python 2.7 regex.
    2. You have any regex pattern with optional groups (with or without nested expressions) and are trying to use those groups in your sub replacement argument (re.sub(pattern, *repl*, string) or compiled.sub(*repl*, string))

    Solution:

    For results, return match.group(1) instead of \1 (or 2, 3, etc.). That's it; there is no or needed. The group result(s) can be returned with a function or a lambda.

    Example

    You are using a common regex to strip C-style comments. Its design uses an optional group 1 to pass through pseudo-comments which should not be deleted (if they exist).

    pattern = r'//.*|/\*[\s\S]*?\*/|("(\\.|[^"])*")'
    regex = re.compile(pattern)
    

    Using \1 fails with the error: "sre_constants.error: unmatched group":

    return regex.sub(r'\1', string)
    

    Using .group(1) succeeds:

    return regex.sub(lambda m: m.group(1), string)
    

    For those not familiar with lambda, this solution is equivalent to:

    def optgroup(match):
        return match.group(1)
    return regex.sub(optgroup, string)
    

    See the accepted answer for an excellent discussion of why \1 fails due to Bug 1519638. While the accepted answer is authoritative, it has two shortcomings: 1) the example from the original question is so convoluted that it makes the example solution difficult reading, and 2) it suggests returning a group or empty string -- that is not required, you may merely call .group() on each match.

提交回复
热议问题