Replace special characters with ASCII equivalent

前端未结

关注

 6  1847

星月不相逢 2020-12-08 10:16

Is there any lib that can replace special characters to ASCII equivalents, like:

\"Cześć\"

to:

\"Czesc\"

6条回答

陌清茗 (楼主)

2020-12-08 10:44

The package unidecode worked best for me:

from unidecode import unidecode
text = "Björn, Łukasz and Σωκράτης."
print(unidecode(text))
# ==> Bjorn, Lukasz and Sokrates.

You might need to install the package:

pip install unidecode

The above solution is easier and more robust than encoding (and decoding) the output of unicodedata.normalize(), as suggested by other answers.

# This doesn't work as expected:
ret = unicodedata.normalize('NFKD', text).encode('ascii', 'ignore')
print(ret)
# ==> b'Bjorn, ukasz and .'
# Besides not supporting all characters, the returned value is a
# bytes object in python3. To yield a str type:
ret = ret.decode("utf8") # (not required in python2)

0 讨论(0)

查看其它6个回答