Python open(“x”, “r”) function, how do I know or control which encoding the file is supposed to have?

北慕城南 提交于 2019-12-05 09:10:29

You can't. Reading a file is independent of its encoding; you'll need to know the encoding in advance in order to properly interpret the bytes you read in.

For example, if you know the file is encoded in UTF-8:

with open('filename', 'rb') as f:
    contents = f.read().decode('utf-8-sig')    # -sig deals with BOM, if present

Or if you know the file is ASCII only:

with open('filename', 'r') as f:
    contents = f.read()    # results in a str object

If you really don't know the encoding of the file, then there's obviously no guarantee that you can read it properly; however, you can guess at the encoding using a tool like chardet.

UPDATE:

I think I understand your question now. I thought you had a file you needed to write code for, but it seems you have code you need to write a file for ;-)

The code in question probably only deals properly with plain ASCII (it's possible the strings are converted later, but unlikely I think). So you'll want to make a text file that contains only ASCII (codepoint < 128) characters, and make sure it is saved in an ASCII encoding (i.e. not UTF-16 or anything like that). This is a little unfortunate considering that Mercurial deals with filenames, which can contain Unicode characters.

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!