Replace special characters with ASCII equivalent

前端 未结 6 1844
星月不相逢
星月不相逢 2020-12-08 10:16

Is there any lib that can replace special characters to ASCII equivalents, like:

\"Cześć\"

to:

\"Czesc\"

6条回答
  •  情话喂你
    2020-12-08 10:23

    I did it this way:

    POLISH_CHARACTERS = {
        50309:'a',50311:'c',50329:'e',50562:'l',50564:'n',50099:'o',50587:'s',50618:'z',50620:'z',
        50308:'A',50310:'C',50328:'E',50561:'L',50563:'N',50067:'O',50586:'S',50617:'Z',50619:'Z',}
    
    def encodePL(text):
        nrmtxt = unicodedata.normalize('NFC',text)
        i = 0
        ret_str = []
        while i < len(nrmtxt):
            if ord(text[i])>128: # non ASCII character
                fbyte = ord(text[i])
                sbyte = ord(text[i+1])
                lkey = (fbyte << 8) + sbyte
                ret_str.append(POLISH_CHARACTERS.get(lkey))
                i = i+1
            else: # pure ASCII character
                ret_str.append(text[i])
            i = i+1
        return ''.join(ret_str)
    

    when executed:

    encodePL(u'ąćęłńóśźż ĄĆĘŁŃÓŚŹŻ')
    

    it will produce output like this:

    u'acelnoszz ACELNOSZZ'
    

    This works fine for me - ;D

提交回复
热议问题