Approximately converting unicode string to ascii string in python

后端 未结 5 1498
甜味超标
甜味超标 2020-12-24 14:04

don\'t know wether this is trivial or not, but I\'d need to convert an unicode string to ascii string, and I wouldn\'t like to have all those escape chars around. I mean, is

相关标签:
5条回答
  • 2020-12-24 14:41

    There is a technique to strip accents from characters, but other characters need to be directly replaced. Check this article: http://effbot.org/zone/unicode-convert.htm

    0 讨论(0)
  • 2020-12-24 14:43

    Try simple character replacement

    str1 = "“I am the greatest”, said Gavin O’Connor"
    print(str1)
    print(str1.replace("’", "'").replace("“","\"").replace("”","\""))
    

    PS: add # -*- coding: utf-8 -*- to the top of your .py file if you get error

    0 讨论(0)
  • 2020-12-24 14:52
    b = str(a.encode('utf-8').decode('ascii', 'ignore'))
    

    should work fine.

    0 讨论(0)
  • 2020-12-24 14:55

    Use the Unidecode package to transliterate the string.

    >>> import unidecode
    >>> unidecode.unidecode(u'Gavin O’Connor')
    "Gavin O'Connor"
    
    0 讨论(0)
  • 2020-12-24 14:55
    import unicodedata
    
    unicode_string = u"Gavin O’Connor"
    print unicodedata.normalize('NFKD', unicode_string).encode('ascii','ignore')
    

    Output:

    Gavin O'Connor
    

    Here's the document that describes the normalization forms: http://unicode.org/reports/tr15/

    0 讨论(0)
提交回复
热议问题