How to decode unicode string that is read from a file in Python?

问题

I have a file containing UTF-16 strings. When I try to read the unicode, " " (double quotes) are added and the string looks like "b'\\xff\\xfeA\\x00'". The inbuilt .decode function throws a AttributeError: 'str' object has no attribute 'decode'. I tried a few options but those didn't work.

This is what the file I am reading from looks like

回答1:

Try this:

str.encode().decode()

回答2:

It looks like the file has been created by writing bytes literals to it, something like this:

some_bytes = b'Hello world'
with open('myfile.txt', 'w') as f:
    f.write(str(some_bytes))

This gets around the fact that attempting write bytes to a file opened in text mode raises an error, but at the cost that the file now contains "b'hello world'" (note the 'b' inside the quotes).

The solution is to decode the bytes to str before writing:

some_bytes = b'Hello world'
my_str = some_bytes.decode('utf-16') # or whatever the encoding of the bytes might be
with open('myfile.txt', 'w') as f:
    f.write(my_str)

or open the file in binary mode and write the bytes directly

some_bytes = b'Hello world'
with open('myfile.txt', 'wb') as f:
    f.write(some_bytes)

Note you will need to provide the correct encoding if opening the file in text mode

with open('myfile.txt', encoding='utf-16') as f:  # Be sure to use the correct encoding

Consider running Python with the -b or -bb flag set to raise a warning or exception respectively to detect attempts to stringify bytes.

来源：https://stackoverflow.com/questions/65168223/how-to-decode-unicode-string-that-is-read-from-a-file-in-python

标签

python

python-3.x

character-encoding

utf-16

python-unicode