How to determine the encoding of text?

前端 未结 10 1614
一向
一向 2020-11-21 07:47

I received some text that is encoded, but I don\'t know what charset was used. Is there a way to determine the encoding of a text file using Python? How can I detect the enc

10条回答
  •  孤城傲影
    2020-11-21 08:07

    Depending on your platform, I just opt to use the linux shell file command. This works for me since I am using it in a script that exclusively runs on one of our linux machines.

    Obviously this isn't an ideal solution or answer, but it could be modified to fit your needs. In my case I just need to determine whether a file is UTF-8 or not.

    import subprocess
    file_cmd = ['file', 'test.txt']
    p = subprocess.Popen(file_cmd, stdout=subprocess.PIPE)
    cmd_output = p.stdout.readlines()
    # x will begin with the file type output as is observed using 'file' command
    x = cmd_output[0].split(": ")[1]
    return x.startswith('UTF-8')
    

提交回复
热议问题