问题
When I have a string like this:
s1 = 'stuff(remove_me)'
I can easily remove the parentheses and the text within using
# returns 'stuff'
res1 = re.sub(r'\([^)]*\)', '', s1)
as explained here.
But I sometimes encounter nested expressions like this:
s2 = 'stuff(remove(me))'
When I run the command from above, I end up with
'stuff)'
I also tried:
re.sub('\(.*?\)', '', s2)
which gives me the same output.
How can I remove everything within the outer parentheses - including the parentheses themselves - so that I also end up with 'stuff'
(which should work for arbitrarily complex expressions)?
回答1:
re
matches are eager so they try to match as much text as possible, for the simple test case you mention just let the regex run:
>>> re.sub(r'\(.*\)', '', 'stuff(remove(me))')
'stuff'
回答2:
NOTE: \(.*\)
matches the first (
from the left, then matches any 0+ characters (other than a newline if a DOTALL modifier is not enabled) up to the last )
, and does not account for properly nested parentheses.
To remove nested parentheses correctly with a regular expression in Python, you may use a simple \([^()]*\) (matching a (
, then 0+ chars other than (
and )
and then a )
) in a while block using re.subn:
def remove_text_between_parens(text):
n = 1 # run at least once
while n:
text, n = re.subn(r'\([^()]*\)', '', text) # remove non-nested/flat balanced parts
return text
Bascially: remove the (...)
with no (
and )
inside until no match is found. Usage:
print(remove_text_between_parens('stuff (inside (nested) brackets) (and (some(are)) here) here'))
# => stuff here
A non-regex way is also possible:
def removeNestedParentheses(s):
ret = ''
skip = 0
for i in s:
if i == '(':
skip += 1
elif i == ')'and skip > 0:
skip -= 1
elif skip == 0:
ret += i
return ret
x = removeNestedParentheses('stuff (inside (nested) brackets) (and (some(are)) here) here')
print(x)
# => 'stuff here'
See another Python demo
回答3:
As mentioned before, you'd need a recursive regex for matching arbitrary levels of nesting but if you know there can only be a maximum of one level of nesting have a try with this pattern:
\((?:[^)(]|\([^)(]*\))*\)
[^)(]
matches a character, that is not a parenthesis (negated class).|\([^)(]*\)
or it matches another(
)
pair with any amount of non)(
inside.(?:
...)*
all this any amount of times inside(
)
Here is a demo at regex101
Before the alternation used [^)(]
without +
quantifier to fail faster if unbalanced.
You need to add more levels of nesting that might occure. Eg for max 2 levels:
\((?:[^)(]|\((?:[^)(]|\([^)(]*\))*\))*\)
Another demo at regex101
回答4:
If you are sure that the parentheses are initially balanced, just use the greedy version:
re.sub(r'\(.*\)', '', s2)
回答5:
https://regex101.com/r/kQ2jS3/1
'(\(.*\))'
This captures the furthest
parentheses, and everything in between the parentheses.
Your old regex captures the first parentheses, and everything between to the next
parentheses.
来源:https://stackoverflow.com/questions/37528373/how-to-remove-all-text-between-the-outer-parentheses-in-a-string