'utf-8' codec can't decode byte reading a file in Python3.4 but not in Python2.7

后端 未结 2 1002
没有蜡笔的小新
没有蜡笔的小新 2021-01-04 19:53

I was trying to read a file in python2.7, and it was readen perfectly. The problem that I have is when I execute the same program in Python3.4 and then appear the error:

相关标签:
2条回答
  • 2021-01-04 20:11

    In Python2,

    f = open(filename,'r')
    for line in f:
    

    reads lines from the file as bytes.

    In Python3, the same code reads lines from the file as strings. Python3 strings are what Python2 call unicode objects. These are bytes decoded according to some encoding. The default encoding in Python3 is utf-8.

    The error message

    'utf-8' codec can't decode byte 0xf2 in position 424: invalid continuation byte'
    

    shows Python3 is trying to decode the bytes as utf-8. Since there is an error, the file apparently does not contain utf-8 encoded bytes.

    To fix the problem you need to specify the correct encoding of the file:

    with open(filename, encoding=enc) as f:
        for line in f:
    

    If you do not know the correct encoding, you could run this program to simply try all the encodings known to Python. If you are lucky there will be an encoding which turns the bytes into recognizable characters. Sometimes more than one encoding may appear to work, in which case you'll need to check and compare the results carefully.

    # Python3
    import pkgutil
    import os
    import encodings
    
    def all_encodings():
        modnames = set(
            [modname for importer, modname, ispkg in pkgutil.walk_packages(
                path=[os.path.dirname(encodings.__file__)], prefix='')])
        aliases = set(encodings.aliases.aliases.values())
        return modnames.union(aliases)
    
    filename = '/tmp/test'
    encodings = all_encodings()
    for enc in encodings:
        try:
            with open(filename, encoding=enc) as f:
                # print the encoding and the first 500 characters
                print(enc, f.read(500))
        except Exception:
            pass
    
    0 讨论(0)
  • 2021-01-04 20:35

    Ok, I did the same as @unutbu tell me. The result was a lot of encodings one of these are cp1250, for that reason I change :

    f = open(filename,'r')
    

    to

    f = open(filename,'r', encoding='cp1250')
    

    like @triplee suggest me. And now I can read my files.

    0 讨论(0)
提交回复
热议问题