问题
In english, sometimes you have repeating letter like this :
hello my hero hhhhhhhhhhh
that's for h
, but I want to remove all kinds of letters repeating like this 2 or more times and replace them with a space in unicode letter. I have arabic here. I only have one letter I can remove, this is my code:
#remove laughing
def remove_laughs(self, text):
text=re.sub("ه{2,}", "", text)
return text
回答1:
try this:
from itertools import groupby
def remove_dups(s):
replace_with = ' '
return ''.join([x if sum(1 for i in y)<2 else replace_with for x,y in groupby(s)])
回答2:
any duplicated character
import re
re.sub(r'(.)\1+', ' ', 'مرحبا هههههههههه')
# 'مرحبا '
only letter characters
import regex
regex.sub(r'(\pL)\1+', ' ', 'مرحبا هههههههههه')
来源:https://stackoverflow.com/questions/56337626/how-to-remove-any-unicode-repeating-letter