Let's say I have a file, and the file contains this:
HelloxxxHelloxxxHello
I compile a pattern to look for 'Hello'
Pattern pattern = Pattern.compile("Hello");
Then I use an inputstream to read in the file and convert it into a String so that it can be regexed.
Once the matcher finds a match in the file, it indicates this, but it doesn't tell me how many matches it found; simply that it found a match within the String.
So, as the string is relatively short, and the buffer I'm using is 200 bytes, it should find three matches. However, it just simply says match, and doesn't provide me with a count of how many matches there were.
What's the easiest way of counting the number of matches that occured within the String. I've tried various for loops and using the matcher.groupCount() but I'm getting nowhere fast.
matcher.find()
does not find all matches, only the next match.
You'll have to do the following:
int count = 0;
while (matcher.find())
count++;
Btw, matcher.groupCount()
is something completely different.
Complete example:
import java.util.regex.*;
class Test {
public static void main(String[] args) {
String hello = "HelloxxxHelloxxxHello";
Pattern pattern = Pattern.compile("Hello");
Matcher matcher = pattern.matcher(hello);
int count = 0;
while (matcher.find())
count++;
System.out.println(count); // prints 3
}
}
Handling overlapping matches
When counting matches of aa
in aaaa
the above snippet will give you 2.
aaaa
aa
aa
To get 3 matches, i.e. this behavior:
aaaa
aa
aa
aa
You have to search for a match at index <start of last match> + 1
as follows:
String hello = "aaaa";
Pattern pattern = Pattern.compile("aa");
Matcher matcher = pattern.matcher(hello);
int count = 0;
int i = 0;
while (matcher.find(i)) {
count++;
i = matcher.start() + 1;
}
System.out.println(count); // prints 3
This should work for matches that might overlap:
public static void main(String[] args) {
String input = "aaaaaaaa";
String regex = "aa";
Pattern pattern = Pattern.compile(regex);
Matcher matcher = pattern.matcher(input);
int from = 0;
int count = 0;
while(matcher.find(from)) {
count++;
from = matcher.start() + 1;
}
System.out.println(count);
}
If you want to use Java 8 streams and are allergic to while
loops, you could try this:
public static int countPattern(String references, Pattern referencePattern) {
Matcher matcher = referencePattern.matcher(references);
return Stream.iterate(0, i -> i + 1)
.filter(i -> !matcher.find())
.findFirst()
.get();
}
Disclaimer: this only works for disjoint matches.
Example:
public static void main(String[] args) throws ParseException {
Pattern referencePattern = Pattern.compile("PASSENGER:\\d+");
System.out.println(countPattern("[ \"PASSENGER:1\", \"PASSENGER:2\", \"AIR:1\", \"AIR:2\", \"FOP:2\" ]", referencePattern));
System.out.println(countPattern("[ \"AIR:1\", \"AIR:2\", \"FOP:2\" ]", referencePattern));
System.out.println(countPattern("[ \"AIR:1\", \"AIR:2\", \"FOP:2\", \"PASSENGER:1\" ]", referencePattern));
System.out.println(countPattern("[ ]", referencePattern));
}
This prints out:
2
0
1
0
This is a solution for disjoint matches with streams:
public static int countPattern(String references, Pattern referencePattern) {
return StreamSupport.stream(Spliterators.spliteratorUnknownSize(
new Iterator<Integer>() {
Matcher matcher = referencePattern.matcher(references);
int from = 0;
@Override
public boolean hasNext() {
return matcher.find(from);
}
@Override
public Integer next() {
from = matcher.start() + 1;
return 1;
}
},
Spliterator.IMMUTABLE), false).reduce(0, (a, c) -> a + c);
}
来源:https://stackoverflow.com/questions/7378451/java-regex-match-count