Regex to remove repeated character pattern in a string

前端 未结 4 1703
梦毁少年i
梦毁少年i 2020-12-14 22:30

I have a string that may have a repeated character pattern, e.g.

\'xyzzyxxyzzyxxyzzyx\'

I need to write a regex that would replace such str

4条回答
  •  醉酒成梦
    2020-12-14 23:23

    Try this regex pattern and capture the first group:

    ^(.+?)\1+$
    
    • ^ anchor for beginning of string/line
    • . any character except newlines
    • + quantifier to denote atleast 1 occurence
    • ? makes the + lazy instead of greedy, hence giving you the shortest pattern
    • () capturing group
    • \1+ backreference with quantifier to denote that pattern should repeat atleast once
    • $ anchor for end of string/line

    Test it here: Rubular


    The above solution does a lot of backtracking affecting performance. If you know the which characters are not allowed in these strings, then you can use a negated characted set which eliminates backtracking. For e.g., if whitespaces are not allowed, then

    ^([^\s]+)\1+$
    

提交回复
热议问题