How to Reduce the time taken to load a pickle file in python

后端未结

关注

 3  786

星月不相逢 2020-12-08 02:47

I have created a dictionary in python and dumped into pickle. Its size went to 300MB. Now, I want to load the same pickle.

output = open(\'myfile.pkl\', \'rb


      
      
        
          3条回答        

        
                    
            
            
                         
                
              
              
                
                   予麋鹿
                                             
                
                
                (楼主)
            
              
              
                2020-12-08 03:13
              

            
            
                        
Try using the json library instead of pickle. This should be an option in your case because you're dealing with a dictionary which is a relatively simple object.

According to this website,


  JSON is 25 times faster in reading (loads) and 15 times faster in
  writing (dumps).


Also see this question: What is faster - Loading a pickled dictionary object or Loading a JSON file - to a dictionary?

Upgrading Python or using the marshal module with a fixed Python version also helps boost speed (code adapted from here):

try: import cPickle
except: import pickle as cPickle
import pickle
import json, marshal, random
from time import time
from hashlib import md5

test_runs = 1000

if __name__ == "__main__":
    payload = {
        "float": [(random.randrange(0, 99) + random.random()) for i in range(1000)],
        "int": [random.randrange(0, 9999) for i in range(1000)],
        "str": [md5(str(random.random()).encode('utf8')).hexdigest() for i in range(1000)]
    }
    modules = [json, pickle, cPickle, marshal]

    for payload_type in payload:
        data = payload[payload_type]
        for module in modules:
            start = time()
            if module.__name__ in ['pickle', 'cPickle']:
                for i in range(test_runs): serialized = module.dumps(data, protocol=-1)
            else:
                for i in range(test_runs): serialized = module.dumps(data)
            w = time() - start
            start = time()
            for i in range(test_runs):
                unserialized = module.loads(serialized)
            r = time() - start
            print("%s %s W %.3f R %.3f" % (module.__name__, payload_type, w, r))


Results:

C:\Python27\python.exe -u "serialization_benchmark.py"
json int W 0.125 R 0.156
pickle int W 2.808 R 1.139
cPickle int W 0.047 R 0.046
marshal int W 0.016 R 0.031
json float W 1.981 R 0.624
pickle float W 2.607 R 1.092
cPickle float W 0.063 R 0.062
marshal float W 0.047 R 0.031
json str W 0.172 R 0.437
pickle str W 5.149 R 2.309
cPickle str W 0.281 R 0.156
marshal str W 0.109 R 0.047

C:\pypy-1.6\pypy-c -u "serialization_benchmark.py"
json int W 0.515 R 0.452
pickle int W 0.546 R 0.219
cPickle int W 0.577 R 0.171
marshal int W 0.032 R 0.031
json float W 2.390 R 1.341
pickle float W 0.656 R 0.436
cPickle float W 0.593 R 0.406
marshal float W 0.327 R 0.203
json str W 1.141 R 1.186
pickle str W 0.702 R 0.546
cPickle str W 0.828 R 0.562
marshal str W 0.265 R 0.078

c:\Python34\python -u "serialization_benchmark.py"
json int W 0.203 R 0.140
pickle int W 0.047 R 0.062
pickle int W 0.031 R 0.062
marshal int W 0.031 R 0.047
json float W 1.935 R 0.749
pickle float W 0.047 R 0.062
pickle float W 0.047 R 0.062
marshal float W 0.047 R 0.047
json str W 0.281 R 0.187
pickle str W 0.125 R 0.140
pickle str W 0.125 R 0.140
marshal str W 0.094 R 0.078


Python 3.4 uses pickle protocol 3 as default, which gave no difference compared to protocol 4. Python 2 has protocol 2 as highest pickle protocol (selected if negative value is provided to dump), which is twice as slow as protocol 3.
    
             
                                                        
            
            
              
                
                0
              
                   
                
               讨论(0)
              
                                                  
              
              
                          
             
       
          
              
                                       
     查看其它3个回答


            
                         
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
                              			
        
        
        
          
            
            
              
              
            
    


                                 
              
            
                          
    

        
         
                验证码
                
                  
                
                
                   看不清?
                
              
                                  
                    
   
                 
             
              提交回复