Matcher not finding overlapping words?

别等时光非礼了梦想. 提交于 2019-12-04 06:02:31

问题


I'm trying to take a string:

String s = "This is a String!";

And return all 2-word pairs within that string. Namely:

{"this is", "is a", "a String"}

But right now, all I can get it to do is return:

{"this is", "a String"}

How can I define my while loop such that I can account for this lack of overlapping words? My code is as follows: (Really, I'd be happy with it just returning an int representing how many string subsets it found...)

int count = 0;
while(matcher.find()) {
    count += 1;
}

Thanks all.


回答1:


I like the two answers already posted, counting words and subtracting one, but if you just need a regex to find overlapping matches:

Pattern pattern = Pattern.compile('\\S+ \\S+');
Matcher matcher = pattern.matcher(inputString);
int matchCount = 0;
boolean found = matcher.find();
while (found) {
  matchCount += 1;
  // search starting after the last match began
  found = matcher.find(matcher.start() + 1);
}

In reality, you'll need to be a little more clever than simply adding 1, since trying this on "the force" will match "he force" and then "e force". Of course, this is overkill for counting words, but this may prove useful if the regex is more complicated than that.




回答2:


Total pair count = Total number of words - 1

And you already know how to count total number of words.




回答3:


Run a for loop from i = 0 to the number of words - 2, then the words i and i+1 will make up a single 2-word string.

String[] splitString = string.split(" ");
for(int i = 0; i < splitString.length - 1; i++) {
    System.out.println(splitString[i] + " " + splitString[i+1]);
}

The number of 2-word strings within a sentence is simply the number of words minus one.

int numOfWords = string.split(" ").length - 1;



回答4:


I tried with group of pattern.

String s = "this is a String";

Pattern pat = Pattern.compile("([^ ]+)( )([^ ]+)");
Matcher mat = pat.matcher(s);
boolean check = mat.find();
while(check){
    System.out.println(mat.group());
    check = matPOS.find(mat.start(3));
}

from the pattern ([^ ]+)( )([^ ]+)
...........................|_______________|
..................................group(0)
..........................|([^ ]+)| <--group(1)
......................................|( )| <--group(2)
............................................|([^ ]+)| <--group(3)



来源:https://stackoverflow.com/questions/12470918/matcher-not-finding-overlapping-words

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!