Replace special characters with ASCII equivalent

前端 未结 6 1847
星月不相逢
星月不相逢 2020-12-08 10:16

Is there any lib that can replace special characters to ASCII equivalents, like:

\"Cześć\"

to:

\"Czesc\"

6条回答
  •  陌清茗
    陌清茗 (楼主)
    2020-12-08 10:44

    The package unidecode worked best for me:

    from unidecode import unidecode
    text = "Björn, Łukasz and Σωκράτης."
    print(unidecode(text))
    # ==> Bjorn, Lukasz and Sokrates.
    

    You might need to install the package:

    pip install unidecode
    

    The above solution is easier and more robust than encoding (and decoding) the output of unicodedata.normalize(), as suggested by other answers.

    # This doesn't work as expected:
    ret = unicodedata.normalize('NFKD', text).encode('ascii', 'ignore')
    print(ret)
    # ==> b'Bjorn, ukasz and .'
    # Besides not supporting all characters, the returned value is a
    # bytes object in python3. To yield a str type:
    ret = ret.decode("utf8") # (not required in python2)
    

提交回复
热议问题