Why reading rows is faster than reading columns?

后端未结
关注
 2  1799
粉色の甜心 2021-02-04 12:19
I am analysing a dataset having 200 rows and 1200 columns, this dataset is stored in a .CSV file. In order to process, I read this file using R\'s read.csv()<

      
      
        
          2条回答        

        
                    
            
            
                         
                
              
              
                
                   耶瑟儿～
                                             
                
                
                (楼主)
            
              
              
                2021-02-04 12:31
              

            
            
                        
Wide data sets are typically slower to read into memory than long data sets (i.e. the transposed one). This effects many programs that read data, such as R, Python, Excel, etc. though this description is more pertinent to R:


R needs to allocate memory to each cell, even if it is NA. This means that every column has at least as many cells as the number of rows in the csv file, whereas in a long dataset you can potentially drop the NA values and save some space
R has to guess the data type for each value and make sure it's consistent with the data type of the column, which also introduces overhead


Since your dataset doesn't appear to contain any NA values, my hunch is that you're seeing the speed improvement because of the second point. You can test this theory by passing colClasses = rep('numeric', 20) to read.csv or fread for the 20 column data set, or rep('numeric', 120) for the 120 column one, which should decrease the overhead of guessing data types.
    
             
                                                        
            
            
              
                
                0
              
                   
                
               讨论(0)
              
                                                  
              
              
                          
             
       
          
              
                                       
     查看其它2个回答


            
                         
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
                              			
        
        
        
          
            
            
              
              
            
    


                                 
              
            
                          
    

        
         
                验证码
                
                  
                
                
                   看不清?
                
              
                                  
                    
   
                 
             
              提交回复