latin-1 to ascii

前端 未结 6 1069
佛祖请我去吃肉
佛祖请我去吃肉 2020-11-30 01:46

I have a unicode string with accented latin chars e.g.

n=unicode(\'Wikipédia, le projet d’encyclopédie\',\'utf-8\')

I want to convert it to

6条回答
  •  挽巷
    挽巷 (楼主)
    2020-11-30 02:14

    The "correct" way to do this is to register your own error handler for unicode encoding/decoding, and in that error handler provide the replacements from è to e and ö to o, etc.

    Like so:

    # -*- coding: UTF-8 -*-
    import codecs
    
    map = {u'é': u'e',
           u'’': u"'",
           # ETC
           }
    
    def asciify(error):
        return map[error.object[error.start]], error.end
    
    codecs.register_error('asciify', asciify)
    
    test = u'Wikipédia, le projet d’encyclopédie'
    print test.encode('ascii', 'asciify')
    

    You might also find something in IBM's ICU library and it's Python bindings PyICU, though, it might be less work.

提交回复
热议问题