问题
I'm using a java.util.Scanner to scan all occurrences of a given regex from a big string.
Scanner sc = new Scanner(body);
sc.useDelimiter("");
String match = "";
while(match!=null)
{
match = sc.findWithinHorizon(pattern, 0);
if(match==null)break;
MatchResult mr = sc.match();
System.out.println("Match string: "+mr.group());
System.out.println("Match string using indexes: "+body.substring(mr.start(),mr.end());
}
The strange thing is that after a certain number of scans, group() method returns the correct occurrence while the start() and end() methods return wrong indexes like the scan has restarted from the beginning of the file. The regex is multiline (i use this regex to discover a line change "\r\n|[\n\r\u2028\u2029\u0085]").
Do you have any hint? Could it be related to the "horizon" parameter (I've tried differend combinations for that value)?
For more details, it seems related to the dimension of the file (more than 1000 chars), after about 1000 the counter restart from 0 (e.g. the first wrong index occurrence after 1003:1020 becomes 3:120).
回答1:
Scanner
uses an internal buffer with 1024
characters. Use Pattern
instead:
Matcher matcher = Pattern.compile(...).matcher(body);
while(matcher.find()) {
int start = matcher.start();
}
来源:https://stackoverflow.com/questions/12401936/java-java-util-regex-matchresult-counter-problems-with-scanner