Regex to remove repeated character pattern in a string

前端 未结 4 1704
梦毁少年i
梦毁少年i 2020-12-14 22:30

I have a string that may have a repeated character pattern, e.g.

\'xyzzyxxyzzyxxyzzyx\'

I need to write a regex that would replace such str

4条回答
  •  情书的邮戳
    2020-12-14 23:27

    Use the following:

    > re.sub(r'(.+?)\1+', r'\1', 'xyzzyxxyzzyxxyzzyx')
    'xyzzyx'
    > re.sub(r'(.+?)\1+', r'\1', 'abcbaccbaabcbaccbaabcbaccba')
    'abcbaccba'
    > re.sub(r'(.+?)\1+', r'\1', 'iiiiiiiiiiiiiiiiii')
    'i'
    

    It basically matches a pattern that repeats itself (.+?)\1+, and removes everything but the repeating pattern, which is captured in the first group \1. Also note that using a reluctant qualifier here, i.e., +? will make the regex backtrack quite a lot.

    DEMO.

提交回复
热议问题