Pandas memory error

前端未结

关注

 2  1072

I have a csv file with ~50,000 rows and 300 columns. Performing the following operation is causing a memory error in Pandas (python):

merged_df.stack(0).rese


                      
              相关标签:


      
      
        
          2条回答        

        
                         				            
            
           
            
                              
                
              
              
                
                  遥遥无期        
                
              
                            
                2020-12-19 21:29
              
            
            
                                                                       
So it takes on my 64-bit linux (32GB) memory, a little less than 2GB.

In [5]: def f():
       df = DataFrame(np.random.randn(50000,300))
       df.stack().reset_index(1)


In [6]: %memit f()
maximum of 1: 1791.054688 MB per loop


Since you didn't specify. This won't work on 32-bit at all (as you can't usually allocate a 2GB contiguous block), but should work if you have reasonable swap / memory.
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  谎友^        
                
              
                            
                2020-12-19 21:36
              
            
            
                                                                       
As an alternative approach you can use the library "dask" 
 e.g: 

# Dataframes implement the Pandas API
import dask.dataframe as dd`<br>
df = dd.read_csv('s3://.../2018-*-*.csv')

                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
                             
        
        
          
            
            
              
              
            
    


                                 
              
            
                          
    

        
         
                验证码
                
                  
                
                
                   看不清?
                
              
                                  
                    
   
                 
             
              提交回复