Regex fails to capture all groups

社会主义新天地 提交于 2019-12-02 21:47:33

问题


Using java.util.regex (jdk 1.6), the regular expression 201210(\d{5,5})Test applied to the subject string 20121000002Test only captures group(0) and does not capture group(1) (the pattern 00002) as it should, given the code below:

Pattern p1 = Pattern.compile("201210(\\d{5,5})Test");
Matcher m1 = p1.matcher("20121000002Test");

if(m1.find()){

    for(int i = 1; i<m1.groupCount(); i++){         
    System.out.println("number = "+m1.group(i));            
    }
}

Curiously, another similar regular expression like 201210(\d{5,5})Test(\d{1,10}) applied to the subject string 20121000002Test0000000099 captures group 0 and 1 but not group 2.

On the contrary, by using JavaScript's RegExp object, the exact same regular expressions applied to the exact same subject strings captures all groups, as one could expect. I checked and re-checked this fact on my own by using these online testers:

  • http://www.regular-expressions.info/javascriptexample.html
  • http://www.regextester.com/

Am I doing something wrong here? Or is it that Java's regex library really sucks?


回答1:


m1.groupCount() returns the number of capturing groups, ie. 1 in your first case so you won't enter in this loop for(int i = 1; i<m1.groupCount(); i++)

It should be for(int i = 1; i<=m1.groupCount(); i++)




回答2:


Change the line

for(int i = 1; i<m1.groupCount(); i++){     

to

for(int i = 1; i<=m1.groupCount(); i++){      //NOTE THE = ADDED HERE    

It now works as a charm!




回答3:


From java.util.regex.MatchResult.groupCount:

Group zero denotes the entire pattern by convention. It is not included in this count.

So iterate through groupCount() + 1.




回答4:


the regular expression "201210(\d{5,5})Test" applied to the subject string "20121000002Test" only captures group(0) and does not capture group(1)

Well I can say I didn't read the manual either but if you do it says for Matcher.groupCount()

Returns the number of capturing groups in this matcher's pattern. Group zero denotes the entire pattern by convention. It is not included in this count.




回答5:


for (int i = 1; i <= m1.groupCount(); i++) { 
                   ↑
              your problem


来源:https://stackoverflow.com/questions/12989917/regex-fails-to-capture-all-groups

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!