How to decode unicode string that is read from a file in Python?

烈酒焚心 提交于 2021-02-11 13:22:31

问题


I have a file containing UTF-16 strings. When I try to read the unicode, " " (double quotes) are added and the string looks like "b'\\xff\\xfeA\\x00'". The inbuilt .decode function throws a AttributeError: 'str' object has no attribute 'decode'. I tried a few options but those didn't work.

This is what the file I am reading from looks like


回答1:


Try this:

str.encode().decode()



回答2:


It looks like the file has been created by writing bytes literals to it, something like this:

some_bytes = b'Hello world'
with open('myfile.txt', 'w') as f:
    f.write(str(some_bytes))

This gets around the fact that attempting write bytes to a file opened in text mode raises an error, but at the cost that the file now contains "b'hello world'" (note the 'b' inside the quotes).

The solution is to decode the bytes to str before writing:

some_bytes = b'Hello world'
my_str = some_bytes.decode('utf-16') # or whatever the encoding of the bytes might be
with open('myfile.txt', 'w') as f:
    f.write(my_str)

or open the file in binary mode and write the bytes directly

some_bytes = b'Hello world'
with open('myfile.txt', 'wb') as f:
    f.write(some_bytes)

Note you will need to provide the correct encoding if opening the file in text mode

with open('myfile.txt', encoding='utf-16') as f:  # Be sure to use the correct encoding

Consider running Python with the -b or -bb flag set to raise a warning or exception respectively to detect attempts to stringify bytes.



来源:https://stackoverflow.com/questions/65168223/how-to-decode-unicode-string-that-is-read-from-a-file-in-python

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!