Zero-length matches in Java Regex

前端 未结 2 533
轻奢々
轻奢々 2021-01-05 07:06

My code :

Pattern pattern = Pattern.compile(\"a?\");
Matcher matcher = pattern.matcher(\"ababa\");
while(matcher.find()){
   System.out.println(matcher.star         


        
相关标签:
2条回答
  • 2021-01-05 07:19

    Iterating over few examples would clear out the functioning of matcher.find() to you :

    Regex engine takes on one character from string (i.e. ababa) and tries to find if pattern you are seeking in string could be found or not. If the pattern exists, then (as API mentioned) :

    matcher.start() returns the starting index, matcher.end() returns the offset after the last character matched.

    If match do not exists. then start() and end() returns the same index, which is to comply the length matched is zero.

    Look down following examples :

            // Searching for string either "a" or ""
            Pattern pattern = Pattern.compile("a?");
            Matcher matcher = pattern.matcher("abaabbbb");
            while(matcher.find()){
               System.out.println(matcher.start()+"["+matcher.group()+"]"+matcher.end());
            }
    

    Output:

        0[a]1
        1[]1
        2[a]3
        3[a]4
        4[]4
        5[]5
        6[]6
        7[]7
        8[]8
    
    
          // Searching for string either "aa" or "a"
           Pattern pattern = Pattern.compile("aa?");
        Matcher matcher = pattern.matcher("abaabbbb");
        while(matcher.find()){
           System.out.println(matcher.start()+"["+matcher.group()+"]"+matcher.end());
        }
    

    Output:

    0[a]1
    2[aa]4
    
    0 讨论(0)
  • 2021-01-05 07:33

    The ? is a greedy quantifier, therefore it will first try to match the 1-occurence before trying the 0-occurence. In you string,

    1. it starts with the first char 'a' and tries to match agains the 1-occurence. The 'a' char matches and so it returns the first result you see
    2. then it moves forward and find a 'b'. The 'b' char does not match your regexp 1-occurence, so the engine backtracks and attempt to match a 0-occurence. Result is that the empty string is matched--> you get your second result.
    3. then it moves ahead of b since no more matches are possible there and it starts again with your second 'a' char.
    4. etc... you get the point...

    It is a bit more complicated than that but that is the main idea. When the 1-occurence cannot match, it will then try with the 0-occurence.

    As for the values of start, end and group, they will be where the match starts, ends and the group is what has been matched, so in the first 0-occurence match of your string, you get 1, 1 and the emtpy string. I am not sure this really answers your question.

    0 讨论(0)
提交回复
热议问题