Memory usage keep growing with Python's multiprocessing.pool

后端未结

关注

 5  1890

孤街浪徒 2020-12-04 10:35

Here\'s the program:

#!/usr/bin/python

import multiprocessing

def dummy_func(r):
    pass

def worker():
    pass

if __name__ == \'__main__\':
    pool =


      
      
        
          5条回答        

        
                    
            
            
                         
                
              
              
                
                   粉色の甜心
                                             
                
                
                (楼主)
            
              
              
                2020-12-04 11:21
              

            
            
                        
I have a very large 3d point cloud data set I'm processing. I tried using the multiprocessing module to speed up the processing, but I started getting out of memory errors. After some research and testing I determined that I was filling the queue of tasks to be processed much quicker than the subprocesses could empty it. I'm sure by chunking, or using map_async or something I could have adjusted the load, but I didn't want to make major changes to the surrounding logic.

The dumb solution I hit on is to check the pool._cache length intermittently, and if the cache is too large then wait for the queue to empty.

In my mainloop I already had a counter and a status ticker:

# Update status
count += 1
if count%10000 == 0:
    sys.stdout.write('.')
    if len(pool._cache) > 1e6:
        print "waiting for cache to clear..."
        last.wait() # Where last is assigned the latest ApplyResult


So every 10k insertion into the pool I check if there are more than 1 million operations queued (about 1G of memory used in the main process). When the queue is full I just wait for the last inserted job to finish.

Now my program can run for hours without running out of memory. The main process just pauses occasionally while the workers continue processing the data.

BTW the _cache member is documented the the multiprocessing module pool example:

#
# Check there are no outstanding tasks
#

assert not pool._cache, 'cache = %r' % pool._cache

    
             
                                                        
            
            
              
                
                0
              
                   
                
               讨论(0)
              
                                                  
              
              
                          
             
       
          
              
                                       
     查看其它5个回答


            
                         
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
                              			
        
        
        
          
            
            
              
              
            
    


                                 
              
            
                          
    

        
         
                验证码
                
                  
                
                
                   看不清?
                
              
                                  
                    
   
                 
             
              提交回复