Regex : Remove all commas between a quote separated string [python]

前端 未结 2 1927
时光说笑
时光说笑 2020-12-12 06:18

What would be an appropriate regex to remove all commas in a string as such:

12, 1425073747, \"test\", \"1, 2, 3, ... \"

Result:

         


        
相关标签:
2条回答
  • 2020-12-12 06:20

    This should work for you:

    >>> input = '12, 1425073747, "test", "1, 2, 3, ... "';
    >>> print re.sub(r'(?!(([^"]*"){2})*[^"]*$),', "", input);
    12, 1425073747, "test", "1 2 3 ... "
    

    (?!(([^"]*"){2})*[^"]*$) matches text only if inside quotea -- avoid matching even number of quotes after comma.

    0 讨论(0)
  • 2020-12-12 06:25

    You may use a re.sub with a simple r'"[^"]*"' regex and pass the match object to a callable used as the replacement argument where you may further manipulate the match:

    import re
    text = '12, 1425073747, "test", "1, 2, 3, ... "'
    print( re.sub(r'"[^"]*"', lambda x: x.group().replace(",", ""), text) )
    

    See the Python demo.

    If the string between quotes may contain escaped quotes use

    re.sub(r'(?s)"[^"\\]*(?:\\.[^"\\]*)*"', lambda x: x.group().replace(",", ""), text)
    

    Here, (?s) is the inline version of a re.S / re.DOTALL flag and the rest is the double quoted string literal matching pattern.

    Bonus

    • Removing all whitespace in between double quotes: re.sub(r'"[^"]*"', lambda x: ''.join(x.group().split()), text)
    • Remove all non-digit chars inside double quotes: re.sub(r'"[^"]*"', lambda x: ''.join(c for c in x.group() if c.isdigit()), text)
    • Remove all digit chars inside double quotes: re.sub(r'"[^"]*"', lambda x: ''.join(c for c in x.group() if not c.isdigit()), text)
    0 讨论(0)
提交回复
热议问题