don\'t know wether this is trivial or not, but I\'d need to convert an unicode string to ascii string, and I wouldn\'t like to have all those escape chars around. I mean, is
There is a technique to strip accents from characters, but other characters need to be directly replaced. Check this article: http://effbot.org/zone/unicode-convert.htm
Try simple character replacement
str1 = "“I am the greatest”, said Gavin O’Connor"
print(str1)
print(str1.replace("’", "'").replace("“","\"").replace("”","\""))
PS: add # -*- coding: utf-8 -*-
to the top of your .py
file if you get error
b = str(a.encode('utf-8').decode('ascii', 'ignore'))
should work fine.
Use the Unidecode package to transliterate the string.
>>> import unidecode
>>> unidecode.unidecode(u'Gavin O’Connor')
"Gavin O'Connor"
import unicodedata
unicode_string = u"Gavin O’Connor"
print unicodedata.normalize('NFKD', unicode_string).encode('ascii','ignore')
Output:
Gavin O'Connor
Here's the document that describes the normalization forms: http://unicode.org/reports/tr15/