Regex capturing group doesn't recognise group(1) despite matches() true

核能气质少年 提交于 2019-12-11 06:55:40

问题


I'm writing some simple (I thought) regex in Java to remove an asterisk or ampersand which occurs directly next to some specified punctuation.
This was my original code:

String ptr = "\\s*[\\*&]+\\s*";
String punct1 = "[,;=\\{}\\[\\]\\)]"; //need two because bracket rules different for ptr to left or right
String punct2 = "[,;=\\{}\\[\\]\\(]";

out = out.replaceAll(ptr+"("+punct1+")|("+punct2+")"+ptr,"$1");

Which instead of just removing the "ptr" part of the string, removed the punct too! (i.e. replaced the matched string with an empty string)
I examined further by doing:

String ptrStr = ".*"+ptr+"("+punct1+")"+".*|.*("+punct2+")"+ptr+".*";
Matcher m_ptrStr = Pattern.compile(ptrStr).matcher(out);

and found that:

m_ptrStr.matches() //returns true, but...
m_ptrStr.group(1) //returns null??

I have no idea what I'm doing wrong as I've used this exact method before with far more complicated regex and group(1) has always returned the captured group. There must be something I haven't been able to spot, so.. any ideas?


回答1:


The problem is that you have an alternation with a capturing group on each side:

(regex1)|(regex2)

The matcher will start and search for a match using the first alternation; if not found, it will try the second alternation.

However, those are still two groups, and only one will match. The one which will not match will return null, and this is what happens to you here.

You therefore need to test both groups; since you have a match, at least one will not be null.




回答2:


When you have | in your pattern, that means that the matcher is allowed to match one of two patterns. Whichever one it matches, any capture groups for the pattern it matches will return the substrings--but any capture groups for the other pattern will return null, because the other pattern wasn't really matched.

It looks like your pattern is

.*\s*[\*&]+\s*([,;=\{}\[\]\)]).*|.*([,;=\{}\[\]\(])+\s*[\*&]+\s*.*
------------- left ------------- -------------- right ------------

If matches() returns true, then either your string matched the "left" pattern, in which case group(1) will be non-null and group(2) will be null; or else it matched the "right" pattern, in which case group(1) will be null and group(2) non-null. [Note: The matcher will not try to find out if both sides are successful matches. That is, if the left side matches, it won't check the right side.]



来源:https://stackoverflow.com/questions/22510793/regex-capturing-group-doesnt-recognise-group1-despite-matches-true

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!