What is the difference between `Greedy` and `Reluctant` regular expression quantifiers?

后端 未结 5 1271
离开以前
离开以前 2020-11-29 07:41

From the Pattern javadocs:

Greedy quantifiers:
X?      X, once or not at all  
X*      X, zero or more times  
X+      X, one or more times  
X{n}    X, exactly n         


        
5条回答
  •  情书的邮戳
    2020-11-29 08:15

    There is documentation on how Perl handles these quantifiers perldoc perlre.

    By default, a quantified subpattern is "greedy", that is, it will match as many times as possible (given a particular starting location) while still allowing the rest of the pattern to match. If you want it to match the minimum number of times possible, follow the quantifier with a "?". Note that the meanings don't change, just the "greediness":
        *?     Match 0 or more times, not greedily
        +?     Match 1 or more times, not greedily
        ??     Match 0 or 1 time, not greedily
        {n}?   Match exactly n times, not greedily
        {n,}?  Match at least n times, not greedily
        {n,m}? Match at least n but not more than m times, not greedily
    
    By default, when a quantified subpattern does not allow the rest of the overall pattern to match, Perl will backtrack. However, this behaviour is sometimes undesirable. Thus Perl provides the "possessive" quantifier form as well.
        *+     Match 0 or more times and give nothing back
        ++     Match 1 or more times and give nothing back
        ?+     Match 0 or 1 time and give nothing back
        {n}+   Match exactly n times and give nothing back (redundant)
        {n,}+  Match at least n times and give nothing back
        {n,m}+ Match at least n but not more than m times and give nothing back
    
    For instance,
       'aaaa' =~ /a++a/
    
    will never match, as the a++ will gobble up all the a 's in the string and won't leave any for the remaining part of the pattern. This feature can be extremely useful to give perl hints about where it shouldn't backtrack. For instance, the typical "match a double-quoted string" problem can be most efficiently performed when written as:
       /"(?:[^"\\]++|\\.)*+"/
    
    as we know that if the final quote does not match, backtracking will not help. See the independent subexpression (?>...) for more details; possessive quantifiers are just syntactic sugar for that construct. For instance the above example could also be written as follows:
       /"(?>(?:(?>[^"\\]+)|\\.)*)"/
    

提交回复
热议问题