问题
I have a sentence, and a set of words say; Mayweather, undefeated … etc. I want to:
- check if the sentence contains any of the above mentioned words… (I want it to look for matching words only, basically ignore full-stops, commas and new lines.)
- and if it does, I want to display few words before and after each matching word, maybe by using
String.format()
Here’s my code which seems to be working OK but not exactly how I want it:
String sentence = "Floyd Mayweather Jr is an American professional boxer " +
"currently undefeated as a professional and is a five-division world champion, " +
"having won ten world titles and the lineal championship in four different weight classes.";
String newText = "";
Pattern p = Pattern.compile("(Mayweather) .* (undefeated)");
Matcher m = p.matcher(sentence);
if (m.find()) {
String group1 = m.group(1);
String group2 = m.group(2);
newText = String.format("%s ... %s" , group1, group2);
System.out.println(newText);
}
The output now is:
Mayweather ... undefeated
What I want is something like this:
Floyd Mayweather Jr is an American ... currently undefeated as a professional ...
Can you please let me know how to do it, or guide me to the right direction cuz I’m stuck.
Thanks in advance guys.
回答1:
If you really want to solve this via RegEx, you need to make your capturing groups match all that you want to output. Currently they match only your search terms:
(Mayweather) .* (undefeated)
// "Mayweather", "undefeated"
You could try something like this (using only one group!), but that would match your whole example:
(.*Mayweather.*undefeated.*)
// -whole text-
Which could be changed to this, to match the two parts again and at most 12 characters before and after (do not use spaces around the "match all" in the middle and make it non-greedy!):
(.{0,12}Mayweather.{0,12}).*?(.{0,12}undefeated.{0,12})
// "Floyd Mayweather Jr is an Am", "r currently undefeated as a profes"
Which could be further refined to stop at word boundaries (result will need to be trimmed):
(\b.{0,12}Mayweather.{0,12}\b).*?(\b.{0,12}undefeated.{0,12}\b)
// "Floyd Mayweather Jr is an ", " currently undefeated as a "
Changing this to output a fixed number of words is left as an exercise for the bored reader.
EDIT: Fixed greediness of ".*" in last two versions (added "?").
回答2:
You can try below one ,
Note :This is just a prototype ,So just don't copy and paste it directly
String str="Floyd Mayweather Jr is an American professional boxer currently undefeated as a professional and is a five-division world champion, having won ten world titles and the lineal championship in four different weight classes.";
int firstIndex=str.indexOf("American");
int secondIndex=str.indexOf("boxer");
String group1=str.substring(0,firstIndex+"American".length()); // gives you 1st group
String group2=str.substring(secondIndex);
String newText = String.format("%s ... %s" , group1, group2);
System.out.println(newText);
Output
Floyd Mayweather Jr is an American ... boxer currently undefeated as a professional and is a five-division world champion, having won ten world titles and the lineal championship in four different weight classes.
回答3:
The issue with your code lies in the use of groups. Regex groups provide the string snippets which you are trying to identify in the first place.
group(0), also written as group = the entire string.
group(1) is your first match = first instance of "Mayweather".
group(2) is your second match = first instance of "undefeated".
You can use the start(int group) and end(int group) methods to find the indices of your matches, and then perform some basic string operations onto a new string.
If you intend on using Regex specifically, your solution would be as follows:
String sentence = ("Floyd Mayweather Jr is an American professional boxer " +
"currently undefeated as a professional and is a five-division world champion, " +
"having won ten world titles and the lineal championship in four different weight classes.");
/** Creates a StringBuilder, which can be altered,
* unlike a string, which is immutable. */
StringBuilder sb = new StringBuilder(sentence.length());
Pattern p = Pattern.compile("(Mayweather) .* (undefeated)");
Matcher m = p.matcher(sentence);
if (m.find()) {
int g1Start = m.start(1);
int g1End = m.end(1);
int g2Start = m.start(2);
int g2End = m.end(2);
sb.append(sentence.substring(0, g1Start));
sb.append("...");
sb.append(sentence.substring(g1End, g2Start));
sb.append("...");
sb.append(sentence.substring(g2End, (sentence.length() - 1)));
and I'm not sure if you needed a newline char at the end, but if so:
sb.append("\r\n");
Then the rest is simple:
newText = sb.toString();
textView.setText(newText);
}
Hope this helps :)
来源:https://stackoverflow.com/questions/29603520/use-java-regex-to-find-multiple-matching-words-in-a-sentence