Is there a way to remove duplicate and continuous words/phrases in a string? E.g.
[in]: foo foo bar bar foo bar
txt1 = 'this is a foo bar bar black sheep , have you any any wool woo , yes sir yes sir three bag woo wu wool'
txt2 = 'this is a sentence sentence sentence this is a sentence where phrases phrases duplicate where phrases duplicate'
def remove_duplicates(txt):
result = []
for word in txt.split():
if word not in result:
result.append(word)
return ' '.join(result)
Ouput:
In [7]: remove_duplicate_words(txt1)
Out[7]: 'this is a foo bar black sheep , have you any wool woo yes sir three bag wu'
In [8]: remove_duplicate_words(txt2)
Out[8]: 'this is a sentence where phrases duplicate'