What's the profit of using /.*?/

纵饮孤独 提交于 2019-12-21 09:14:32

问题


In some Rails code (cucumber features' steps definitions, javascripts, rails_admin gem) I found this regular expression parts:

string =~ /some regexp.+rules should match "(.*?)"/i

I do have some knowledge at regular expressions and i know that * and ? symbols are similar but whilst asterisk means zero and more, the question mark means could be present or could be not.

So, using the question mark near the group of symbols makes its presence non-required within the phrase being tested. What's the... well... the trick of using it near the non-required already group (skipping requirement is made using the asterisk afaik)?


回答1:


Right after a quantifier (like *), the ? has a different meaning and makes it "ungreedy". So while the default is that * consumes as much as possible, *? matches as little as possible.

In your specific case, this is relevant for strings like this:

some regexp rules should match "some string" or "another"

Without the question mark the regex matches the full string (because .* can consume " just like anything else) and some string" or "another is captured. With the use of the question mark, the match will stop as soon as possible, (so after ...some string") and will capture only some string.

Further reading.




回答2:


? has dual meaning.

/foo?/

means the last o can be there zero or one times.

/foo*?/ 

means the last o will be there zero or many times, but select the minimum number, i.e., it's non-greedy.

These might help explain:

'foo'[/foo?/]   # => "foo"
'fo'[/foo?/]    # => "fo"
'fo'[/foo*?/]   # => "fo"
'foo'[/foo*?/]  # => "fo"
'fooo'[/foo*?/] # => "fo"

The non-greedy use of ? is unfortunate I think. They reused an operator we expected to have a single meaning "zero or one" and threw it at us in a way that can really be difficult to decipher.

But, the need was genuine: Too many times we'd write a pattern that would go wildly wrong, gobbling everything in sight, because the regex engine was doing what we said with unforeseen character patterns. Regex can be very complex and convoluted, but the "non-greedy" use of ? helps tame that. Sometimes, using it is the sloppy or quick-n-dirty way out but we don't have time to rewrite the pattern to do it correctly. Sometimes it's the magic bullet and was elegant. I think which it is depends on whether you're under a deadline and writing code to get something done, or you're debugging years after the fact and finally found that ? wasn't the optimal fix.




回答3:


It makes the search non-greedy. That means, it will settle for the shortest possible match, not the longest.




回答4:


Consider this string

"<person>1</person><person>2</person>"

the regex

<person>.*</person> would match <person>1</person><person>2</person>

So, .* is greedy..

the regex

<person>.*?</person> would match <person>1</person> and <person>2</person> in the next match

So, .*? is lazy..



来源:https://stackoverflow.com/questions/13401514/whats-the-profit-of-using

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!