I saw this as an answer for finding repeated words in a string. But when I use it, it thinks This
and is
are the same and deletes the is
\b(\w+)(\b\W+\1\b)*
Explanation:
\b : Any word boundary
(\w+) : Select any word character (letter, number, underscore)
Once all the words are selected, now it's time to select the common words.
( : Grouping starts
\b : Any word boundary
\W+ : Any non-word character
\1 : Select repeated words
\b : Un select if it repeated word is joined with another word
) : Grouping ends
Reference : Example