发表新帖

发表新帖

latin-1 to ascii

前端未结

关注

 6  1100

佛祖请我去吃肉 2020-11-30 01:46

I have a unicode string with accented latin chars e.g.

n=unicode(\'Wikipédia, le projet d’encyclopédie\',\'utf-8\')

I want to convert it to

6条回答

挽巷 (楼主)

2020-11-30 02:14
The "correct" way to do this is to register your own error handler for unicode encoding/decoding, and in that error handler provide the replacements from è to e and ö to o, etc.

Like so:
```
# -*- coding: UTF-8 -*-
import codecs

map = {u'é': u'e',
       u'’': u"'",
       # ETC
       }

def asciify(error):
    return map[error.object[error.start]], error.end

codecs.register_error('asciify', asciify)

test = u'Wikipédia, le projet d’encyclopédie'
print test.encode('ascii', 'asciify')
```
You might also find something in IBM's ICU library and it's Python bindings PyICU, though, it might be less work.
0 讨论(0)

查看其它6个回答
发布评论:

提交评论
- 加载中...

热议问题