Loading a large dictionary using python pickle

后端 未结 5 1518
[愿得一人]
[愿得一人] 2020-12-15 09:41

I have a full inverted index in form of nested python dictionary. Its structure is :

{word : { doc_name : [location_list] } }

For example l

5条回答
  •  庸人自扰
    2020-12-15 09:51

    A common pattern in Python 2.x is to have one version of a module implemented in pure Python, with an optional accelerated version implemented as a C extension; for example, pickle and cPickle. This places the burden of importing the accelerated version and falling back on the pure Python version on each user of these modules. In Python 3.0, the accelerated versions are considered implementation details of the pure Python versions. Users should always import the standard version, which attempts to import the accelerated version and falls back to the pure Python version. The pickle / cPickle pair received this treatment.

    • Protocol version 0 is the original “human-readable” protocol and is backwards compatible with earlier versions of Python.
    • Protocol version 1 is an old binary format which is also compatible with earlier versions of Python.
    • Protocol version 2 was introduced in Python 2.3. It provides much more efficient pickling of new-style classes. Refer to PEP 307 for information about improvements brought by protocol 2.
    • Protocol version 3 was added in Python 3.0. It has explicit support for bytes objects and cannot be unpickled by Python 2.x. This is the default protocol, and the recommended protocol when compatibility with other Python 3 versions is required.
    • Protocol version 4 was added in Python 3.4. It adds support for very large objects, pickling more kinds of objects, and some data format optimizations. Refer to PEP 3154 for information about improvements brought by protocol 4.

    If your dictionary is huge and should only be compatible with Python 3.4 or higher, use:

    pickle.dump(obj, file, protocol=4)
    pickle.load(file, encoding="bytes")
    

    or:

    Pickler(file, 4).dump(obj)
    Unpickler(file).load()
    

    That said, in 2010 the json module was 25 times faster at encoding and 15 times faster at decoding simple types than pickle. My 2014 benchmark says marshal > pickle > json, but marshal's coupled to specific Python versions.

提交回复
热议问题