I saw this as an answer for finding repeated words in a string. But when I use it, it thinks This
and is
are the same and deletes the is
you should have used \b(\w+)\b\s+\b\1\b
, click here to see the result...
Hope this is what you want...
Well well well, the output that you have is
import java.util.regex.*;
public class MyDup {
public static void main (String args[]) {
String input="This This is text text another another";
String originalText = input;
String output = "";
Pattern p = Pattern.compile("\\b(\\w+)\\b\\s+\\b\\1\\b", Pattern.MULTILINE+Pattern.CASE_INSENSITIVE);
Matcher m = p.matcher(input);
System.out.println(m);
if (!m.find())
output = "No duplicates found, no changes made to data";
else
{
while (m.find())
{
if (output == "") {
output = input.replaceFirst(m.group(), m.group(1));
} else {
output = output.replaceAll(m.group(), m.group(1));
}
}
input = output;
m = p.matcher(input);
while (m.find())
{
output = "";
if (output == "") {
output = input.replaceAll(m.group(), m.group(1));
} else {
output = output.replaceAll(m.group(), m.group(1));
}
}
}
System.out.println("After removing duplicate the final string is " + output);
}
Run this code and see what you get as output... Your queries will be solved...
In output
you are replacing duplicate by single word... Isn't it??
When I put System.out.println(m.group() + " : " + m.group(1));
in first if condition I get output as text text : text
i.e. duplicates are replacing by single word.
else
{
while (m.find())
{
if (output == "") {
System.out.println(m.group() + " : " + m.group(1));
output = input.replaceFirst(m.group(), m.group(1));
} else {