I have a unicode string with accented latin chars e.g.
n=unicode(\'Wikipédia, le projet d’encyclopédie\',\'utf-8\')
I want to convert it to
The "correct" way to do this is to register your own error handler for unicode encoding/decoding, and in that error handler provide the replacements from è to e and ö to o, etc.
Like so:
# -*- coding: UTF-8 -*-
import codecs
map = {u'é': u'e',
u'’': u"'",
# ETC
}
def asciify(error):
return map[error.object[error.start]], error.end
codecs.register_error('asciify', asciify)
test = u'Wikipédia, le projet d’encyclopédie'
print test.encode('ascii', 'asciify')
You might also find something in IBM's ICU library and it's Python bindings PyICU, though, it might be less work.