How to split a string by commas positioned outside of parenthesis?

前端 未结 10 2085
南方客
南方客 2020-11-27 19:36

I got a string of such format:

\"Wilbur Smith (Billy, son of John), Eddie Murphy (John), Elvis Presley, Jane Doe (Jane Doe)\"

so basicly i

10条回答
  •  Happy的楠姐
    2020-11-27 20:13

    Here's a general technique I've used in the past for such cases:

    Use the sub function of the re module with a function as replacement argument. The function keeps track of opening and closing parens, brackets and braces, as well as single and double quotes, and performs a replacement only outside of such bracketed and quoted substrings. You can then replace the non-bracketed/quoted commas with another character which you're sure doesn't appear in the string (I use the ASCII/Unicode group-separator: chr(29) code), then do a simple string.split on that character. Here's the code:

    import re
    def srchrepl(srch, repl, string):
        """Replace non-bracketed/quoted occurrences of srch with repl in string"""
    
        resrchrepl = re.compile(r"""(?P[([{])|(?P['"])|(?P["""
                                + srch + """])|(?P[)\]}])""")
        return resrchrepl.sub(_subfact(repl), string)
    
    def _subfact(repl):
        """Replacement function factory for regex sub method in srchrepl."""
        level = 0
        qtflags = 0
        def subf(mo):
            nonlocal level, qtflags
            sepfound = mo.group('sep')
            if  sepfound:
                if level == 0 and qtflags == 0:
                    return repl
                else:
                    return mo.group(0)
            elif mo.group('lbrkt'):
                level += 1
                return mo.group(0)
            elif mo.group('quote') == "'":
                qtflags ^= 1            # toggle bit 1
                return "'"
            elif mo.group('quote') == '"':
                qtflags ^= 2            # toggle bit 2
                return '"'
            elif mo.group('rbrkt'):
                level -= 1
                return mo.group(0)
        return subf
    

    If you don't have nonlocal in your version of Python, just change it to global and define level and qtflags at the module level.

    Here's how it's used:

    >>> GRPSEP = chr(29)
    >>> string = "Wilbur Smith (Billy, son of John), Eddie Murphy (John), Elvis Presley, Jane Doe (Jane Doe)"
    >>> lst = srchrepl(',', GRPSEP, string).split(GRPSEP)
    >>> lst
    ['Wilbur Smith (Billy, son of John)', ' Eddie Murphy (John)', ' Elvis Presley', ' Jane Doe (Jane Doe)']
    

提交回复
热议问题