Python non-greedy regexes

点点圈 提交于 2019-11-25 22:17:32
Trey Stout

You seek the all-powerful '*?'

http://docs.python.org/3/howto/regex.html#greedy-versus-non-greedy

the non-greedy qualifiers *?, +?, ??, or {m,n}? [...] match as little text as possible.

>>> x = "a (b) c (d) e"
>>> re.search(r"\(.*\)", x).group()
'(b) c (d)'
>>> re.search(r"\(.*?\)", x).group()
'(b)'

According to the docs:

The '*', '+', and '?' qualifiers are all greedy; they match as much text as possible. Sometimes this behavior isn’t desired; if the RE <.*> is matched against '<H1>title</H1>', it will match the entire string, and not just '<H1>'. Adding '?' after the qualifier makes it perform the match in non-greedy or minimal fashion; as few characters as possible will be matched. Using .*? in the previous expression will match only '<H1>'.

Zitrax

Would not \\(.*?\\) work? That is the non-greedy syntax.

As the others have said using the ? modifier on the * quantifier will solve your immediate problem, but be careful, you are starting to stray into areas where regexes stop working and you need a parser instead. For instance, the string "(foo (bar)) baz" will cause you problems.

Using an ungreedy match is a good start, but I'd also suggest that you reconsider any use of .* -- what about this?

groups = re.search(r"\([^)]*\)", x)

Do you want it to match "(b)"? Do as Zitrax and Paolo have suggested. Do you want it to match "b"? Do

>>> x = "a (b) c (d) e"
>>> re.search(r"\((.*?)\)", x).group(1)
'b'
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!