R reading a huge csv

后端未结

关注

 5  822

慢半拍i 2020-12-23 23:12

I have a huge csv file. Its size is around 9 gb. I have 16 gb of ram. I followed the advises from the page and implemented them below.

If you get the error


      
      
        
          5条回答        

        
                    
            
            
                         
                
              
              
                
                   眼角桃花
                                             
                
                
                (楼主)
            
              
              
                2020-12-23 23:36
              

            
            
                        
This may not be possible on your computer. In certain cases, data.table takes up more space than its .csv counterpart.

DT <- data.table(x = sample(1:2,10000000,replace = T))
write.csv(DT, "test.csv") #29 MB file
DT <- fread("test.csv", row.names = F)   
object.size(DT)
> 40001072 bytes #40 MB


Two OOM larger:

DT <- data.table(x = sample(1:2,1000000000,replace = T))
write.csv(DT, "test.csv") #2.92 GB file
DT <- fread("test.csv", row.names = F)   
object.size(DT)
> 4000001072 bytes #4.00 GB


There is natural overhead to storing an object in R. Based on these numbers, there is roughly a 1.33 factor when reading files, However, this varies based on data. For example, using 


x = sample(1:10000000,10000000,replace = T)  gives a factor roughly 2x (R:csv).
x = sample(c("foofoofoo","barbarbar"),10000000,replace = T) gives a factor of 0.5x (R:csv).


Based on the max, your 9GB file would take a potential 18GB of memory to store in R, if not more. Based on your error message, it is far more likely that you are hitting hard memory constraints vs. an allocation issue. Therefore, just reading your file in chucks and consolidating would not work - you would also need to partition your analysis + workflow. Another alternative is to use an in-memory tool like SQL.
    
             
                                                        
            
            
              
                
                0
              
                   
                
               讨论(0)
              
                                                  
              
              
                          
             
       
          
              
                                       
     查看其它5个回答


            
                         
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
                              			
        
        
        
          
            
            
              
              
            
    


                                 
              
            
                          
    

        
         
                验证码
                
                  
                
                
                   看不清?
                
              
                                  
                    
   
                 
             
              提交回复