问题
In Python 3, suppose I have
>>> thai_string = 'สีเ'
Using encode gives
>>> thai_string.encode('utf-8')
b'\xe0\xb8\xaa\xe0\xb8\xb5'
My question: how can I get encode() to return a bytes sequence using \u instead of \x? And how can I decode them back to a Python 3 str type?
I tried using the ascii builtin, which gives
>>> ascii(thai_string)
"'\\u0e2a\\u0e35'"
But this doesn't seem quite right, as I can't decode it back to obtain thai_string.
Python documentation tells me that
\xhhescapes the character with the hex valuehhwhile\uxxxxescapes the character with the 16-bit hex valuexxxx
The documentation says that \u is only used in string literals, but I'm not sure what that means. Is this a hint that my question has a flawed premise?
回答1:
You can use unicode_escape:
>>> thai_string.encode('unicode_escape')
b'\\u0e2a\\u0e35\\u0e40'
Note that encode() will always return a byte string (bytes) and the unicode_escape encoding is intended to:
Produce a string that is suitable as Unicode literal in Python source code
来源:https://stackoverflow.com/questions/32280753/how-to-encode-python-3-string-using-u-escape-code