Python3 different behaviour between latin-1 and cp1252 when decoding unmapped characters

老子叫甜甜 提交于 2021-02-10 16:59:59

问题


I'm trying to read in Python3 a text file specifying encoding cp1252 which has unmapped characters (for instance byte 0x8d).

with open(inputfilename, mode='r', encoding='cp1252') as inputfile:
    print(inputfile.readlines())

I obviously get the following exception:

Traceback (most recent call last):
  File "test.py", line 9, in <module>
    print(inputfile.readlines())
  File "/usr/lib/python3.6/encodings/cp1252.py", line 23, in decode
    return codecs.charmap_decode(input,self.errors,decoding_table)[0]
UnicodeDecodeError: 'charmap' codec can't decode byte 0x8d in position 14: character maps to <undefined>

I'd like to understand why, when reading the same file with encoding latin-1, I don't get the same exception and the byte 0x8d is represented as hex string:

$ python3 test.py
['This is a test\x8d file\n']

As far as i know byte 0x8d does not have a match on both encodings (latin-1 and cp1252). What am I missing? Why Python3 behaviour is different?

来源:https://stackoverflow.com/questions/58501530/python3-different-behaviour-between-latin-1-and-cp1252-when-decoding-unmapped-ch

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!