问题
I'm trying to find a smaller string, String patternString1 = "(John) (.+?)";, within a larger string. The smaller string are consist of two groups i.e. (John) (.+?). However, I have obtained completely different result just by adding a space after (.+?).
for String patternString1 = "(John) (.+?)";, (i.e. without space), the result is
found: John w
found: John D
found: John W
For String patternString1 = "(John) (.+?) ";, (i.e. with space), the result is
found: John writes
found: John Doe
found: John Wayne
How come a space can make such a big difference to the result?
String text
= "John writes about this, and John Doe writes about that,"
+ " and John Wayne writes about everything.";
String patternString1 = "(John) (.+?)";
Pattern pattern = Pattern.compile(patternString1);
Matcher matcher = pattern.matcher(text);
while (matcher.find()) {
System.out.println("found: " + matcher.group(1) + " " + matcher.group(2));
}
回答1:
The .+? quantifier is reluctant (or "lazy"). It means it will match the subpattern it quantifies one or more times, but as few times as necessary to return a valid match.
You have (John) (.+?) pattern and you try to find a match in John writes about this. The regex engine finds John, places it into Group 1 memory buffer, finds a space, matches it, and then finds w in writes. The w is matched, so the requirement of one or more is met. Since the match is already valid, it is returned. You get John w.
Now, you add a space after (.+?). The John is matched and captured into Group 1 as before, the space is matched with the space in the pattern (again, as before), then .+? is executed - finds an empty location before writes. It matches this location and goes on to match a space. There is no space at that location, since there is w. The regex engine goes back to .+? and consumes w. Checks if r is a space - no, it is not. The engine checks the string this way up to the first matching space and finds it right after writes. Thus, your valid match for (John) (.+?) is John writes .
回答2:
Well, if you include the trailing space, you are asking the pattern to match that space as well.
John w does not match anymore, because it does not end with a space.
It has to be expanded to John writes (note that the match includes the space at the end).
来源:https://stackoverflow.com/questions/35761836/why-adding-a-space-after-can-completely-change-the-result