Java java.util.regex.MatchResult counter problems with Scanner

人盡茶涼 提交于 2019-12-24 13:25:57

问题


I'm using a java.util.Scanner to scan all occurrences of a given regex from a big string.

Scanner sc = new Scanner(body);
sc.useDelimiter("");
String match = "";
while(match!=null)
{
    match = sc.findWithinHorizon(pattern, 0);
    if(match==null)break;
    MatchResult mr = sc.match();
    System.out.println("Match string: "+mr.group());
    System.out.println("Match string using indexes: "+body.substring(mr.start(),mr.end());
}

The strange thing is that after a certain number of scans, group() method returns the correct occurrence while the start() and end() methods return wrong indexes like the scan has restarted from the beginning of the file. The regex is multiline (i use this regex to discover a line change "\r\n|[\n\r\u2028\u2029\u0085]").

Do you have any hint? Could it be related to the "horizon" parameter (I've tried differend combinations for that value)?

For more details, it seems related to the dimension of the file (more than 1000 chars), after about 1000 the counter restart from 0 (e.g. the first wrong index occurrence after 1003:1020 becomes 3:120).


回答1:


Scanner uses an internal buffer with 1024 characters. Use Pattern instead:

Matcher matcher = Pattern.compile(...).matcher(body);
while(matcher.find()) {
    int start = matcher.start();
}


来源:https://stackoverflow.com/questions/12401936/java-java-util-regex-matchresult-counter-problems-with-scanner

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!