Regular expression matches more than expected

 ̄綄美尐妖づ 提交于 2021-02-13 17:30:42

问题


Given is the following python script:

text = '<?xml version="1.24" encoding="utf-8">'
mu = (".??[?]?[?]", "....")
for item in mu:
    print item,":",re.search(item, text).group()

Can someone please explain why the first hit with the regex .??[?]?[?] returns <? instead of just ?.

My explaination:

  • .?? should match nothing as .? can match or not any char and the second ? makes it not greedy.
  • [?]? can match ? or not, so nothing is good, too
  • [?] just matches ?

That should result in ? and not in <?


回答1:


For the same reason o*?bar matches oobar in foobar. Even if the quantifier is non-greedy the regex will try to match from the first char in all possible ways, before moving on to the next.

First the .?? matches an empty string, but when the regex engine backtracks to it, it matches <, thus making the rest of the regex match, without moving the start position of the match to the next character.




回答2:


Regex "greediness" only affects backtracking; it doesn't mean that the regex engine will skip earlier potential match points — a regex always takes the first possible match. In this case, that means <? because it starts farther to the left than ?.



来源:https://stackoverflow.com/questions/6589226/regular-expression-matches-more-than-expected

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!