Regular Expression nongreedy is greedy

£可爱£侵袭症+ 提交于 2020-01-02 02:31:10

问题


I have the following text

tooooooooooooon

According to this book I'm reading, when the ? follows after any quantifier, it becomes non greedy.

My regex to*?n is still returning tooooooooooooon.

It should return ton shouldn't it?

Any idea why?


回答1:


A regular expression can only match a fragment of text that actually exists.

Because the substring 'ton' doesn't exist anywhere in your string, it can't be the result of a match. A match will only return a substring of the original string

EDIT: To be clear, if you were using the string below, with an extra 'n'

toooooooonoooooon

this regular expression (which doesn't specify 'o's)

t.*n

would match the following (as many characters as possible before an 'n')

toooooooonoooooon

but the regular expression

t.*?n

would only match the following (as few characters as possible before an 'n')

toooooooon



回答2:


A regular expression es always eager to match.

Your expression says this:

A 't', followed by *as few as possible* 'o's, followed by a 'n'.

That means any o's necessary will be matched, because there is an 'n' at the end, which the expression is eager to reach. Matching all the o's is it's only possibility to succeed.




回答3:


Regexps try to match everything in them. Because there are no less 'o's to match than every o in toooon to match the n, everything is matched. Also, because you are using o*? instead of o+? you are not requiring an o to be present.

Example, in Perl

$a = "toooooo";
$b = "toooooon";

if ($a =~ m/(to*?)/) {
        print $1,"\n";
}
if ($b =~ m/(to*?n)/) {
        print $1,"\n";
}

~>perl ex.pl
t
toooooon



回答4:


The Regex always does its best to match. The only thing you are doing in this case would be slowing your parser down, by having it backtrack into the /o*?/ node. Once for every single 'o' in "tooooon". Whereas with normal matching, it would take as many 'o's, as it can, the first time through. Since the next element to match against is 'n', which won't be matched by 'o', there is little point in trying to use minimal matching. Actually, when the normal matching fails, it would take quite a while for it to fail. It has to backtrack through every 'o', until there is none left to backtrack through. In this case I would actually use maximal matching /to*+n/. The 'o' would take all it could, and never give any of it back. This would make it so that when it fails it fails quickly.

Minimal RE succeeding:

'toooooon' ~~ /to*?n/

 t  o  o  o  o  o  o  n       
{t}                           match [t]
[t]                           match [o] 0 times
[t]<n>                        fail to match [n] -> retry [o]
[t]{o}                        match [o] 1 times
[t][o]<n>                     fail to match [n] -> retry [o]
[t][o]{o}                     match [o] 2 times
[t][o][o]<n>                  fail to match [n] -> retry [o]

. . . .

[t][o][o][o][o]{o}            match [o] 5 times
[t][o][o][o][o][o]<n>         fail to match [n] -> retry [o]
[t][o][o][o][o][o]{o}         match [o] 6 times
[t][o][o][o][o][o][o]{n}      match [n]

Normal RE succeeding:

(NOTE: Similar for Maximal RE)

'toooooon' ~~ /to*n/

 t  o  o  o  o  o  o  n       
{t}                           match [t]
[t]{o}{o}{o}{o}{o}{o}         match [o] 6 times
[t][o][o][o][o][o][o]{n}      match [n]

Failure of Minimal RE:

'toooooo' ~~ /to*?n/

 t  o  o  o  o  o  o

. . . .

. . . .

[t][o][o][o][o]{o}            match [o] 5 times
[t][o][o][o][o][o]<n>         fail to match [n] -> retry [o]
[t][o][o][o][o][o]{o}         match [o] 6 times
[t][o][o][o][o][o][o]<n>      fail to match [n] -> retry [o]
[t][o][o][o][o][o][o]<o>      fail to match [o] 7 times -> match failed

Failure of Normal RE:

'toooooo' ~~ /to*n/

 t  o  o  o  o  o  o       
{t}                           match [t]
[t]{o}{o}{o}{o}{o}{o}         match [o] 6 times
[t][o][o][o][o][o][o]<n>      fail to match [n] -> retry [o]
[t][o][o][o][o][o]            match [o] 5 times
[t][o][o][o][o][o]<n>         fail to match [n] -> retry [o]

. . . .

[t][o]                        match [o] 1 times
[t][o]<o>                     fail to match [n] -> retry [o]
[t]                           match [o] 0 times
[t]<n>                        fail to match [n] -> match failed

Failure of Maximal RE:

'toooooo' ~~ /to*+n/

 t  o  o  o  o  o  o
{t}                           match [t]
[t]{o}{o}{o}{o}{o}{o}         match [o] 6 times
[t][o][o][o][o][o][o]<n>      fail to match [n] -> match failed



回答5:


The string you are searching in (the haystack as it were) does not contain the substring "ton".

It does however contain the substring "tooooooooooooon".



来源:https://stackoverflow.com/questions/246232/regular-expression-nongreedy-is-greedy

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!