Regex in java for finding duplicate consecutive words

后端 未结 6 1862
时光说笑
时光说笑 2020-12-14 19:26

I saw this as an answer for finding repeated words in a string. But when I use it, it thinks This and is are the same and deletes the is

6条回答
  •  粉色の甜心
    2020-12-14 19:48

    Try this one:

    String pattern = "(?i)\\b([a-z]+)\\b(?:\\s+\\1\\b)+";
    Pattern r = Pattern.compile(pattern, Pattern.CASE_INSENSITIVE);
    
    String input = "your string";
    Matcher m = r.matcher(input);
    while (m.find()) {
        input = input.replaceAll(m.group(), m.group(1));
    }
    System.out.println(input);
    

    The Java regular expressions are explained very well in the API documentation of the Pattern class. After adding some spaces to indicate the different parts of the regular expression:

    "(?i) \\b ([a-z]+) \\b (?: \\s+ \\1 \\b )+"
    
    \b       match a word boundary
    [a-z]+   match a word with one or more characters;
             the parentheses capture the word as a group    
    \b       match a word boundary
    (?:      indicates a non-capturing group (which starts here)
    \s+      match one or more white space characters
    \1       is a back reference to the first (captured) group;
             so the word is repeated here
    \b       match a word boundary
    )+       indicates the end of the non-capturing group and
             allows it to occur one or more times
    

提交回复
热议问题