发表新帖

发表新帖

Is there a way to remove duplicate and continuous words/phrases in a string?

前端未结

关注

 6  450

广开言路 2021-01-13 11:27

Is there a way to remove duplicate and continuous words/phrases in a string? E.g.

[in]: foo foo bar bar foo bar

6条回答

臣服心动 (楼主)

2021-01-13 12:04
With a pattern similar to sharcashmo's pattern, you can use subn that returns the number of replacements, inside a while loop :
```
import re

txt = r'this is a sentence sentence sentence this is a sentence where phrases phrases duplicate where phrases duplicate . sentence are not phrases .'

pattern = re.compile(r'(\b\w+(?: \w+)*)(?: \1)+\b')
repl = r'\1'

res = txt

while True:
    [res, nbr] = pattern.subn(repl, res)
    if (nbr == 0):
        break

print res
```
When there is no more replacements the while loop stops.

With this method you can get all overlapped matches (that is impossible with a single pass in a replacement context), without testing two times the same pattern.
0 讨论(0)

查看其它6个回答
发布评论:

提交评论
- 加载中...

热议问题