_pickle in python3 doesn't work for large data saving

匿名 (未验证) 提交于 2019-12-03 02:23:02

问题:

I am trying to apply _pickle to save data onto disk. But when calling _pickle.dump, I got an error

OverflowError: cannot serialize a bytes object larger than 4 GiB 

Is this a hard limitation to use _pickle? (cPickle for python2)

回答1:

Not anymore in Python 3.4 which has PEP 3154 and Pickle 4.0
https://www.python.org/dev/peps/pep-3154/

But you need to say you want to use version 4 of the protocol:
https://docs.python.org/3/library/pickle.html

pickle.dump(d, open("file", 'w'), protocol=4) 


回答2:

Yes, this is a hard-coded limit; from save_bytes function:

else if (size <= 0xffffffffL) {     // ... } else {     PyErr_SetString(PyExc_OverflowError,                     "cannot serialize a bytes object larger than 4 GiB");     return -1;          /* string too large */ } 

The protocol uses 4 bytes to write the size of the object to disk, which means you can only track sizes of up to 232 == 4GB.

If you can break up the bytes object into multiple objects, each smaller than 4GB, you can still save the data to a pickle, of course.



回答3:

There is a great answers above for why pickle doesn't work. But it still doesn't work for Python 2.7, which is a problem if you are are still at Python 2.7 and want to support large files, especially NumPy (NumPy arrays over 4G fail).

You can use OC serialization, which has been updated to work for data over 4Gig. There is a Python C Extension module available from:

http://www.picklingtools.com/Downloads

Take a look at the Documentation:

http://www.picklingtools.com/html/faq.html#python-c-extension-modules-new-as-of-picklingtools-1-6-0-and-1-3-3

But, here's a quick summary: there's ocdumps and ocloads, very much like pickle's dumps and loads::

from pyocser import ocdumps, ocloads ser = ocdumps(pyobject)   : Serialize pyobject into string ser pyobject = ocloads(ser)   : Deserialize from string ser into pyobject 

The OC Serialization is 1.5-2x faster and also works with C++ (if you are mixing langauges). It works with all built-in types, but not classes (partly because it is cross-language and it's hard to build C++ classes from Python).



标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!