Whats the difference between \z and \Z in a regular expression and when and how do I use it?

前端 未结 6 1993
鱼传尺愫
鱼传尺愫 2020-12-01 07:27

From http://java.sun.com/j2se/1.5.0/docs/api/java/util/regex/Pattern.html:

\\Z  The end of the input but for the fin         


        
相关标签:
6条回答
  • 2020-12-01 07:55

    Just checked it. It looks like when Matcher.matches() is invoked(like in your code, behind the scenes), \Z behaves like \z. However, when Matcher.find() is invoked, they behave differently as expected. The following returns true:

    Pattern p = Pattern.compile("StackOverflow\\Z");
    Matcher m = p.matcher("StackOverflow\n");
    System.out.println(m.find());
    

    and if you replace \Z with \z it returns false.

    I find this a little surprising...

    0 讨论(0)
  • 2020-12-01 07:57

    Like Eyal said, it works for find() but not for matches().

    This actually makes sense. The \Z anchor itself actually does match the position right before the final eol terminator, but the regular expression as a whole does not match, because, as a whole, it needs to match the entire text being matched, and nothing matches the terminator. (The \Z matches the position right before the terminator, which is not the same thing.)

    If you did "StackOverflow\n".matches("StackOverflow\\Z.*") you should be ok.

    0 讨论(0)
  • 2020-12-01 08:08

    Even though \Z and $ only match at the end of the string (when the option for the caret and dollar to match at embedded line breaks is off), there is one exception. If the string ends with a line break, then \Z and $ will match at the position before that line break, rather than at the very end of the string.

    This "enhancement" was introduced by Perl, and is copied by many regex flavors, including Java, .NET and PCRE. In Perl, when reading a line from a file, the resulting string will end with a line break. Reading a line from a file with the text "joe" results in the string joe\n. When applied to this string, both ^[a-z]+$ and \A[a-z]+\Z will match "joe".

    If you only want a match at the absolute very end of the string, use \z (lower case z instead of upper case Z). \A[a-z]+\z does not match joe\n. \z matches after the line break, which is not matched by the character class.

    http://www.regular-expressions.info/anchors.html

    The way I read this "StackOverflow\n".matches("StackOverflow\\z") should return false because your pattern does not include the newline.

    "StackOverflow\n".matches("StackOverflow\\z\\n") => false
    "StackOverflow\n".matches("StackOverflow\\Z\\n") => true
    
    0 讨论(0)
  • 2020-12-01 08:14

    I think the main problem here is the unexpected behavior of matches(): any match must consume the whole input string. Both of your examples fail because the regexes don't consume the linefeed at the end of the string. The anchors have nothing to do with it.

    In most languages, a regex match can occur anywhere, consuming all, some, or none of the input string. And Java has a method, Matcher#find(), that performs this traditional kind of match. However, the results are the opposite of what you said you expected:

    Pattern.compile("StackOverflow\\z").matcher("StackOverflow\n").find()  //false
    Pattern.compile("StackOverflow\\Z").matcher("StackOverflow\n").find()  //true
    

    In the first example, the \z needs to match the end of the string, but the trailing linefeed is in the way. In the second, the \Z matches before the linefeed, which is at the end of the string.

    0 讨论(0)
  • 2020-12-01 08:16

    I think Alan Moore provided the best answer, especially the crucial point that matches silently inserts ^ and $ into its regex argument.

    I'd also like to add a few examples. And a little more explanation.

    \z matches only at the very end of the string.

    \Z also matches at the very end of the string, but if there's a \n, it will match before it.

    Consider this program:

    import java.util.regex.Matcher;
    import java.util.regex.Pattern;
    
    public class Main {
        public static void main(String[] args) {
            Pattern p = Pattern.compile(".+\\Z"); // some word before the end of the string
            String text = "one\ntwo\nthree\nfour\n";
            Matcher m = p.matcher(text);
            while (m.find()) {
                System.out.println(m.group());
            }
        }
    }
    

    It will find 1 match, and print "four".

    Change \Z to \z, and it will not match anything, because it doesn't want to match before the \n.

    However, this will also print four, because there's no \n at the end:

    import java.util.regex.Matcher;
    import java.util.regex.Pattern;
    
    public class Main {
        public static void main(String[] args) {
            Pattern p = Pattern.compile(".+\\z");
            String text = "one\ntwo\nthree\nfour";
            Matcher m = p.matcher(text);
            while (m.find()) {
                System.out.println(m.group());
            }
        }
    }
    
    0 讨论(0)
  • 2020-12-01 08:17

    \Z is same as $, it matches the end of the string, the end of the string can be followed by a line break.

    \z matches the end of the string, can't be followed by line break.

    0 讨论(0)
提交回复
热议问题