Removing diacritical marks using Python

烈酒焚心 提交于 2020-06-13 05:46:50

问题


I have a couple of text files with characters which has diacritical marks, for example è, á, ô and so on. I'd like to replace these characters with e, a, o, etc

How can I achieve this in Python? Grateful for help!


回答1:


Try unidecode (you may need to install it).

>>> from unidecode import unidecode
>>> s = u"é"
>>> unidecode(s)
'e'



回答2:


Example of what you could do:

 accented_string = u'Málaga'
`enter code here`# accented_string is of type 'unicode'
 import unidecode
 unaccented_string = unidecode.unidecode(accented_string)
 # unaccented_string contains 'Malaga'and is of type 'str'

A very similar example of your problem. Check this: What is the best way to remove accents in a Python unicode string?




回答3:


In Python 3, you simply need to use the unidecode package. It works with both lowercase and uppercase letters.

Installing the package: (you may need to use pip3 instead of pip depending on your system and setup)

$ pip install unidecode

Then using it as follows:

from unidecode import unidecode

text = ["ÉPÍU", "Naïve Café", "EL NIÑO"]

text1 = [unidecode(s) for s in text]
print(text1)
# ['EPIU', 'Naive Cafe', 'EL NINO']

text2 = [unidecode(s.lower()) for s in text]
print(text2)
# ['epiu', 'naive cafe', 'el nino']


来源:https://stackoverflow.com/questions/48445459/removing-diacritical-marks-using-python

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!