Regular Expression For Duplicate Words

后端未结

关注

 13  1990

终归单人心 2020-11-22 11:13

I\'m a regular expression newbie, and I can\'t quite figure out how to write a single regular expression that would "match" any duplicate consecutive words such as

13条回答

不要未来只要你来 (楼主)

2020-11-22 11:51
Since some developers are coming to this page in search of a solution which not only eliminates duplicate consecutive non-whitespace substrings, but triplicates and beyond, I'll show the adapted pattern.

Pattern: /(\b\S+)(?:\s+\1\b)+/ (Pattern Demo)
Replace: $1 (replaces the fullstring match with capture group #1)

This pattern greedily matches a "whole" non-whitespace substring, then requires one or more copies of the matched substring which may be delimited by one or more whitespace characters (space, tab, newline, etc).

Specifically:
- \b (word boundary) characters are vital to ensure partial words are not matched.
- The second parenthetical is a non-capturing group, because this variable width substring does not need to be captured -- only matched/absorbed.
- the + (one or more quantifier) on the non-capturing group is more appropriate than * because * will "bother" the regex engine to capture and replace singleton occurrences -- this is wasteful pattern design.
*note if you are dealing with sentences or input strings with punctuation, then the pattern will need to be further refined.
0 讨论(0)

查看其它13个回答
发布评论:

提交评论
- 加载中...