Regex to remove repeated character pattern in a string

前端未结

关注

 4  1701

I have a string that may have a repeated character pattern, e.g.

\'xyzzyxxyzzyxxyzzyx\'

I need to write a regex that would replace such str

相关标签:

4条回答

情话喂你

2020-12-14 23:18
How (using re module) write function, that remove all duplications.
```
import re
def remove_duplications(string):
    return re.sub(r'(.+?)\1+', r'\1', string)
```
0 讨论(0)
发布评论:

提交评论
- 加载中...
青春惊慌失措

2020-12-14 23:23
Since you want the smallest repeating pattern, something like the following should work for you:
```
re.sub(r'^(.+?)\1+$', r'\1', input_string)
```
The ^ and $ anchors make sure you don't get matches in the middle of the string, and by using .+? instead of just .+ you will get the shortest pattern (compare results using a string like 'aaaaaaaaaa').
0 讨论(0)
发布评论:

提交评论
- 加载中...
醉酒成梦

2020-12-14 23:23
Try this regex pattern and capture the first group:
```
^(.+?)\1+$
```
- ^ anchor for beginning of string/line
- . any character except newlines
- + quantifier to denote atleast 1 occurence
- ? makes the + lazy instead of greedy, hence giving you the shortest pattern
- () capturing group
- \1+ backreference with quantifier to denote that pattern should repeat atleast once
- $ anchor for end of string/line
Test it here: Rubular

The above solution does a lot of backtracking affecting performance. If you know the which characters are not allowed in these strings, then you can use a negated characted set which eliminates backtracking. For e.g., if whitespaces are not allowed, then
```
^([^\s]+)\1+$
```
0 讨论(0)
发布评论:

提交评论
- 加载中...
情书的邮戳

2020-12-14 23:27
Use the following:
```
> re.sub(r'(.+?)\1+', r'\1', 'xyzzyxxyzzyxxyzzyx')
'xyzzyx'
> re.sub(r'(.+?)\1+', r'\1', 'abcbaccbaabcbaccbaabcbaccba')
'abcbaccba'
> re.sub(r'(.+?)\1+', r'\1', 'iiiiiiiiiiiiiiiiii')
'i'
```
It basically matches a pattern that repeats itself (.+?)\1+, and removes everything but the repeating pattern, which is captured in the first group \1. Also note that using a reluctant qualifier here, i.e., +? will make the regex backtrack quite a lot.

DEMO.
0 讨论(0)
发布评论:

提交评论
- 加载中...