How to remove any unicode repeating letter?

廉价感情. 提交于 2021-01-29 17:16:18

问题


In english, sometimes you have repeating letter like this : hello my hero hhhhhhhhhhh that's for h, but I want to remove all kinds of letters repeating like this 2 or more times and replace them with a space in unicode letter. I have arabic here. I only have one letter I can remove, this is my code:

#remove laughing
def remove_laughs(self, text):       
    text=re.sub("ه{2,}", "", text)
    return text

回答1:


try this:

from itertools import groupby

def remove_dups(s):
    replace_with = ' '
    return ''.join([x if sum(1 for i in y)<2 else replace_with for x,y in groupby(s)])



回答2:


any duplicated character

import re
re.sub(r'(.)\1+', ' ', 'مرحبا هههههههههه')
# 'مرحبا  '

only letter characters

import regex
regex.sub(r'(\pL)\1+', ' ', 'مرحبا هههههههههه')


来源:https://stackoverflow.com/questions/56337626/how-to-remove-any-unicode-repeating-letter

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!