问题
How do I make a python regex like \"(.*)\" such that, given \"a (b) c (d) e\" python matches \"b\" instead of \"b) c (d\"?
I know that I can use \"[^)]\" instead of \".\", but I\'m looking for a more general solution that keeps my regex a little cleaner. Is there any way to tell python \"hey, match this as soon as possible\"?
回答1:
You seek the all-powerful '*?'
http://docs.python.org/3/howto/regex.html#greedy-versus-non-greedy
the non-greedy qualifiers *?, +?, ??, or {m,n}? [...] match as little text as possible.
回答2:
>>> x = "a (b) c (d) e"
>>> re.search(r"\(.*\)", x).group()
'(b) c (d)'
>>> re.search(r"\(.*?\)", x).group()
'(b)'
According to the docs:
The '
*
', '+
', and '?
' qualifiers are all greedy; they match as much text as possible. Sometimes this behavior isn’t desired; if the RE<.*>
is matched against '<H1>title</H1>
', it will match the entire string, and not just '<H1>
'. Adding '?
' after the qualifier makes it perform the match in non-greedy or minimal fashion; as few characters as possible will be matched. Using.*?
in the previous expression will match only '<H1>
'.
回答3:
Would not \\(.*?\\)
work? That is the non-greedy syntax.
回答4:
As the others have said using the ? modifier on the * quantifier will solve your immediate problem, but be careful, you are starting to stray into areas where regexes stop working and you need a parser instead. For instance, the string "(foo (bar)) baz" will cause you problems.
回答5:
Using an ungreedy match is a good start, but I'd also suggest that you reconsider any use of .*
-- what about this?
groups = re.search(r"\([^)]*\)", x)
回答6:
Do you want it to match "(b)"? Do as Zitrax and Paolo have suggested. Do you want it to match "b"? Do
>>> x = "a (b) c (d) e"
>>> re.search(r"\((.*?)\)", x).group(1)
'b'
回答7:
To start with, I do not suggest using "*" in regexes. Yes, I know, it is the most used multi-character delimiter, but it is nevertheless a bad idea. This is because, while it does match any amount of repetition for that character, "any" includes 0, which is usually something you want to throw a syntax error for, not accept. Instead, I suggest using the +
sign, which matches any repetition of length > 1. What's more, from what I can see, you are dealing with fixed-length parenthesized expressions. As a result, you can probably use the {x, y}
syntax to specifically specify the desired length.
However, if you really do need non-greedy repetition, I suggest consulting the all-powerful ?
. This, when placed after at the end of any regex repetition specifier, will force that part of the regex to find the least amount of text possible.
That being said, I would be very careful with the ?
as it, like the Sonic Screwdriver in Dr. Who, has a tendency to do, how should I put it, "slightly" undesired things if not carefully calibrated. For example, to use your example input, it would identify ((1)
(note the lack of a second rparen) as a match.
来源:https://stackoverflow.com/questions/766372/python-non-greedy-regexes