Python and character normalization

前端 未结 4 2127
有刺的猬
有刺的猬 2020-12-01 09:42

Hello I retrieve text based utf8 data from a foreign source which contains special chars such as u\"ıöüç\" while I want to normalize them to English such as

4条回答
  •  挽巷
    挽巷 (楼主)
    2020-12-01 09:56

    I recommend using Unidecode module:

    >>> from unidecode import unidecode
    >>> unidecode(u'ıöüç')
    'iouc'
    

    Note how you feed it a unicode string and it outputs a byte string. The output is guaranteed to be ASCII.

提交回复
热议问题