What would be an appropriate regex to remove all commas in a string as such:
12, 1425073747, \"test\", \"1, 2, 3, ... \"
Result:
This should work for you:
>>> input = '12, 1425073747, "test", "1, 2, 3, ... "';
>>> print re.sub(r'(?!(([^"]*"){2})*[^"]*$),', "", input);
12, 1425073747, "test", "1 2 3 ... "
(?!(([^"]*"){2})*[^"]*$)
matches text only if inside quotea -- avoid matching even number of quotes after comma.
You may use a re.sub
with a simple r'"[^"]*"'
regex and pass the match object to a callable used as the replacement argument where you may further manipulate the match:
import re
text = '12, 1425073747, "test", "1, 2, 3, ... "'
print( re.sub(r'"[^"]*"', lambda x: x.group().replace(",", ""), text) )
See the Python demo.
If the string between quotes may contain escaped quotes use
re.sub(r'(?s)"[^"\\]*(?:\\.[^"\\]*)*"', lambda x: x.group().replace(",", ""), text)
Here, (?s)
is the inline version of a re.S
/ re.DOTALL
flag and the rest is the double quoted string literal matching pattern.
Bonus
re.sub(r'"[^"]*"', lambda x: ''.join(x.group().split()), text)
re.sub(r'"[^"]*"', lambda x: ''.join(c for c in x.group() if c.isdigit()), text)
re.sub(r'"[^"]*"', lambda x: ''.join(c for c in x.group() if not c.isdigit()), text)