Python pickle protocol choice?

一世执手 提交于 2019-12-18 10:18:03

问题


I an using python 2.7 and trying to pickle an object. I am wondering what the real difference is between the pickle protocols.

import numpy as np
import pickle
class data(object):
    def __init__(self):
        self.a = np.zeros((100, 37000, 3), dtype=np.float32)

d = data()
print "data size: ", d.a.nbytes/1000000.
print "highest protocol: ", pickle.HIGHEST_PROTOCOL
pickle.dump(d,open("noProt", 'w'))
pickle.dump(d,open("prot0", 'w'), protocol=0)
pickle.dump(d,open("prot1", 'w'), protocol=1)
pickle.dump(d,open("prot2", 'w'), protocol=2)


out >> data size:  44.4
out >> highest protocol:  2

then I found that the saved files have different sizes on disk:

  • noProt: 177.6MB
  • prot0: 177.6MB
  • prot1: 44.4MB
  • prot2: 44.4MB

I know that prot0 is a human readable text file, so I don't want to use it. I guess protocol 0 is the one given by default.

I wonder what's the difference between protocols 1 and 2, is there a reason why I should chose one or another?

What's is the better to use, pickle or cPickle?


回答1:


From the pickle module data format documentation:

There are currently 3 different protocols which can be used for pickling.

  • Protocol version 0 is the original ASCII protocol and is backwards compatible with earlier versions of Python.
  • Protocol version 1 is the old binary format which is also compatible with earlier versions of Python.
  • Protocol version 2 was introduced in Python 2.3. It provides much more efficient pickling of new-style classes.

[...]

If a protocol is not specified, protocol 0 is used. If protocol is specified as a negative value or HIGHEST_PROTOCOL, the highest protocol version available will be used.

Stick with protocol version 2, especially if you are using custom classes derived from object (new-style classes). Which most modern code does, these days.

Unless you need to maintain backwards compatibility with older Python versions, it's easiest to just stick with the highest protocol version you can lay your hands on:

with open("prot2", 'wb') as pfile:
    pickle.dump(d, pfile, protocol=pickle.HIGHEST_PROTOCOL)

Because this is a binary format, make sure to use 'wb' as the file mode!

cPickle and pickle are mostly compatible; the differences lie in the API offered. For most use-cases, just stick with cPickle; it is faster. Quoting the documentation again:

First, cPickle can be up to 1000 times faster than pickle because the former is implemented in C. Second, in the cPickle module the callables Pickler() and Unpickler() are functions, not classes. This means that you cannot use them to derive custom pickling and unpickling subclasses. Most applications have no need for this functionality and should benefit from the greatly improved performance of the cPickle module.




回答2:


For people using Python 3, there are, as of Python 3.5, five possible protocols to choose from:

There are currently 5 different protocols which can be used for pickling. The higher the protocol used, the more recent the version of Python needed to read the pickle produced [doc]:

  • Protocol version 0 is the original “human-readable” protocol and is backwards compatible with earlier versions of Python.

  • Protocol version 1 is an old binary format which is also compatible with earlier versions of Python.

  • Protocol version 2 was introduced in Python 2.3. It provides much more efficient pickling of new-style classes. Refer to PEP 307 for information about improvements brought by protocol 2.
  • Protocol version 3 was added in Python 3.0. It has explicit support for bytes objects and cannot be unpickled by Python 2.x. This is the default protocol, and the recommended protocol when compatibility with other Python 3 versions is required.
  • Protocol version 4 was added in Python 3.4. It adds support for very large objects, pickling more kinds of objects, and some data format optimizations. Refer to PEP 3154 for information about improvements brought by protocol 4.

A general rule is that you should use the highest possible protocol that is backward compatible with what you want to use it for. So if you want it to be backward compatible with Python 2, then protocol version 2 is a good choice, if you want it to be backward compatible with all Python versions then version 1 is good. If you do not care about backward compatibility then using pickle.HIGHEST_PROTOCOL automatically gives you the highest protocol for your Python version.

Also in Python 3, importing pickle automatically imports the C implementation.

Another point to note in terms of compatibility is that, by default protocols 3 and 4 use unicode encoding of strings whereas earlier protocols do not. So in Python 3, if you load a pickled file which was pickled in Python 2, you will probably have to explicitly specify the encoding in order to load it properly.



来源:https://stackoverflow.com/questions/23582489/python-pickle-protocol-choice

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!