How to Reduce the time taken to load a pickle file in python

后端 未结 3 786
星月不相逢
星月不相逢 2020-12-08 02:47

I have created a dictionary in python and dumped into pickle. Its size went to 300MB. Now, I want to load the same pickle.

output = open(\'myfile.pkl\', \'rb         


        
3条回答
  •  予麋鹿
    予麋鹿 (楼主)
    2020-12-08 03:13

    Try using the json library instead of pickle. This should be an option in your case because you're dealing with a dictionary which is a relatively simple object.

    According to this website,

    JSON is 25 times faster in reading (loads) and 15 times faster in writing (dumps).

    Also see this question: What is faster - Loading a pickled dictionary object or Loading a JSON file - to a dictionary?

    Upgrading Python or using the marshal module with a fixed Python version also helps boost speed (code adapted from here):

    try: import cPickle
    except: import pickle as cPickle
    import pickle
    import json, marshal, random
    from time import time
    from hashlib import md5
    
    test_runs = 1000
    
    if __name__ == "__main__":
        payload = {
            "float": [(random.randrange(0, 99) + random.random()) for i in range(1000)],
            "int": [random.randrange(0, 9999) for i in range(1000)],
            "str": [md5(str(random.random()).encode('utf8')).hexdigest() for i in range(1000)]
        }
        modules = [json, pickle, cPickle, marshal]
    
        for payload_type in payload:
            data = payload[payload_type]
            for module in modules:
                start = time()
                if module.__name__ in ['pickle', 'cPickle']:
                    for i in range(test_runs): serialized = module.dumps(data, protocol=-1)
                else:
                    for i in range(test_runs): serialized = module.dumps(data)
                w = time() - start
                start = time()
                for i in range(test_runs):
                    unserialized = module.loads(serialized)
                r = time() - start
                print("%s %s W %.3f R %.3f" % (module.__name__, payload_type, w, r))
    

    Results:

    C:\Python27\python.exe -u "serialization_benchmark.py"
    json int W 0.125 R 0.156
    pickle int W 2.808 R 1.139
    cPickle int W 0.047 R 0.046
    marshal int W 0.016 R 0.031
    json float W 1.981 R 0.624
    pickle float W 2.607 R 1.092
    cPickle float W 0.063 R 0.062
    marshal float W 0.047 R 0.031
    json str W 0.172 R 0.437
    pickle str W 5.149 R 2.309
    cPickle str W 0.281 R 0.156
    marshal str W 0.109 R 0.047
    
    C:\pypy-1.6\pypy-c -u "serialization_benchmark.py"
    json int W 0.515 R 0.452
    pickle int W 0.546 R 0.219
    cPickle int W 0.577 R 0.171
    marshal int W 0.032 R 0.031
    json float W 2.390 R 1.341
    pickle float W 0.656 R 0.436
    cPickle float W 0.593 R 0.406
    marshal float W 0.327 R 0.203
    json str W 1.141 R 1.186
    pickle str W 0.702 R 0.546
    cPickle str W 0.828 R 0.562
    marshal str W 0.265 R 0.078
    
    c:\Python34\python -u "serialization_benchmark.py"
    json int W 0.203 R 0.140
    pickle int W 0.047 R 0.062
    pickle int W 0.031 R 0.062
    marshal int W 0.031 R 0.047
    json float W 1.935 R 0.749
    pickle float W 0.047 R 0.062
    pickle float W 0.047 R 0.062
    marshal float W 0.047 R 0.047
    json str W 0.281 R 0.187
    pickle str W 0.125 R 0.140
    pickle str W 0.125 R 0.140
    marshal str W 0.094 R 0.078
    

    Python 3.4 uses pickle protocol 3 as default, which gave no difference compared to protocol 4. Python 2 has protocol 2 as highest pickle protocol (selected if negative value is provided to dump), which is twice as slow as protocol 3.

提交回复
热议问题