问题
What would be an appropriate regex to remove all commas in a string as such:
12, 1425073747, "test", "1, 2, 3, ... "
Result:
12, 1425073747, "test", "1 2 3 ... "
What I have that matches correctly:
"((\d+), )+\d+"
However, I obviously cant replace this with $1 $2. I can't use "\d+, \d+" because it will match 12, 1425073747 which is not what I want. If someone can explain how to recursively parse out values that would be appreciated as well.
回答1:
This should work for you:
>>> input = '12, 1425073747, "test", "1, 2, 3, ... "';
>>> print re.sub(r'(?!(([^"]*"){2})*[^"]*$),', "", input);
12, 1425073747, "test", "1 2 3 ... "
(?!(([^"]*"){2})*[^"]*$)
matches text only if inside quotea -- avoid matching even number of quotes after comma.
回答2:
You may use a re.sub
with a simple r'"[^"]*"'
regex and pass the match object to a callable used as the replacement argument where you may further manipulate the match:
import re
text = '12, 1425073747, "test", "1, 2, 3, ... "'
print( re.sub(r'"[^"]*"', lambda x: x.group().replace(",", ""), text) )
See the Python demo.
If the string between quotes may contain escaped quotes use
re.sub(r'(?s)"[^"\\]*(?:\\.[^"\\]*)*"', lambda x: x.group().replace(",", ""), text)
Here, (?s)
is the inline version of a re.S
/ re.DOTALL
flag and the rest is the double quoted string literal matching pattern.
Bonus
- Removing all whitespace in between double quotes:
re.sub(r'"[^"]*"', lambda x: ''.join(x.group().split()), text)
- Remove all non-digit chars inside double quotes:
re.sub(r'"[^"]*"', lambda x: ''.join(c for c in x.group() if c.isdigit()), text)
- Remove all digit chars inside double quotes:
re.sub(r'"[^"]*"', lambda x: ''.join(c for c in x.group() if not c.isdigit()), text)
来源:https://stackoverflow.com/questions/28775048/regex-remove-all-commas-between-a-quote-separated-string-python