How do I convert a unicode to a string at the Python level?

后端 未结 7 1321
刺人心
刺人心 2020-12-09 17:56

The following unicode and string can exist on their own if defined explicitly:

>>> value_str=\'Andr\\xc3\\xa9\'
>>> value_uni=u\'Andr\\xc3\         


        
7条回答
  •  抹茶落季
    2020-12-09 18:32

    If you have u'Andr\xc3\xa9', that is a Unicode string that was decoded from a byte string with the wrong encoding. The correct encoding is UTF-8. To convert it back to a byte string so you can decode it correctly, you can use the trick you discovered. The first 256 code points of Unicode are a 1:1 mapping with ISO-8859-1 (alias latin1) encoding. So:

    >>> u'Andr\xc3\xa9'.encode('latin1')
    'Andr\xc3\xa9'
    

    Now it is a byte string that can be decoded correctly with utf8:

    >>> 'Andr\xc3\xa9'.decode('utf8')
    u'Andr\xe9'
    >>> print 'Andr\xc3\xa9'.decode('utf8')
    André
    

    In one step:

    >>> print u'Andr\xc3\xa9'.encode('latin1').decode('utf8')
    André
    

提交回复
热议问题