UnicodeDecodeError:'gbk' codec can't decode byte 0x80 in position 0 illegal multibyte sequence

依然范特西╮ 提交于 2019-12-22 09:47:17

问题


I use python 3.4 with win 7 64-bit system. I ran the following code:

      6   """ load single batch of cifar """
      7   with open(filename, 'r') as f:
----> 8     datadict = pickle.load(f)
      9     X = datadict['data']

The wrong message is UnicodeDecodeError: 'gbk' codec can't decode byte 0x80 in position 0: illegal multibyte sequence

I changed the line 7 as:

      6   """ load single batch of cifar """
      7   with open(filename, 'r',encoding='utf-8') as f:
----> 8     datadict = pickle.load(f)
      9     X = datadict['data']

The wrong message became UnicodeDecodeError: 'utf-8' codec can't decode byte 0x80 in position 0: invalid start byte.

The message finally points to the Python34\lib\codecs.py in decode(self, input, final).

    311         # decode input (taking the buffer into account)
    312         data = self.buffer + input
--> 313         (result, consumed) = self._buffer_decode(data, self.errors, final)
    314         # keep undecoded input until the next call
    315         self.buffer = data[consumed:]

I further changed the code as:

      6 """ load single batch of cifar """ 
      7 with open(filename, 'rb') as f:
----> 8 datadict = pickle.load(f) 
      9 X = datadict['data'] 10 Y = datadict['labels']

Well, this time is UnicodeDecodeError: 'ascii' codec can't decode byte 0x8b in position 6: ordinal not in range(128).

What is the problem and how to solve it?


回答1:


Pickle files are binary data files, so you always have to open the file with the 'rb' mode when loading. Don't try to use a text mode here.

You are trying to load a Python 2 pickle that contains string data. You'll have to tell pickle.load() how to convert that data to Python 3 strings, or to leave them as bytes.

The default is to try and decode those strings as ASCII, and that decoding fails. See the pickle.load() documentation:

Optional keyword arguments are fix_imports, encoding and errors, which are used to control compatibility support for pickle stream generated by Python 2. If fix_imports is true, pickle will try to map the old Python 2 names to the new names used in Python 3. The encoding and errors tell pickle how to decode 8-bit string instances pickled by Python 2; these default to ‘ASCII’ and ‘strict’, respectively. The encoding can be ‘bytes’ to read these 8-bit string instances as bytes objects.

Setting the encoding to latin1 allows you to import the data directly:

with open(filename, 'rb') as f:
    datadict = pickle.load(f, encoding='latin1') 

It appears that it is the numpy array data that is causing the problems here as all strings in the set use ASCII characters only.

The alternative would by to use encoding='bytes' but then all the filenames and top-level dictionary keys are bytes objects and you'd have to decode those or prefix all your key literals with b.




回答2:


if you will open file with utf-8,then you need write: open(file_name, 'r', encoding='UTF-8') if you will open file with GBK,then you need do: open(file_name, 'rb') hope to solve your problem!



来源:https://stackoverflow.com/questions/28165639/unicodedecodeerror-gbk-codec-cant-decode-byte-0x80-in-position-0-illegal-mult

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!