How to detect duplicate words from a String in Java?

走远了吗. 提交于 2019-11-29 08:59:46

The best you can do with regexes is O(N^2) search complexity. You can easily achieve O(N) time and space search complexity by splitting the input into words and using a HashSet to detect duplicates.

The following Java code resolves the problem of detecting duplicates from a String. There should not be any problem if the duplicate word is separated by newline or punctuation symbols.

    String duplicatePattern = "(?i)\\b(\\w+)\\b[\\w\\W]*\\b\\1\\b";
    Pattern p = Pattern.compile(duplicatePattern);
    String phrase = "this is#$;%@;<>?|\\` p is a is Test\n of duplicate test";
    Matcher m = p.matcher(phrase);
    String val = null;
    while (m.find()) {
        val = m.group();
        System.out.println("Matching segment is \"" + val + "\"");
        System.out.println("Duplicate word: " + m.group(1)+ "\n");
    }

The output of the code will be:

Matching segment is "is#$;%@;<>?|\` p is a is"
Duplicate word: is

Matching segment is "Test
 of duplicate test"
Duplicate word: Test

Here, m.group(1) statement represents the String matched against 1st group of Pattern [here, it's (\\w+)].

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!