Fill option for fread

后端未结

关注

 2  2008

一个人的身影 2020-12-01 16:58

Let\'s say I have this txt file:

\"AA\",3,3,3,3
\"CC\",\"ad\",2,2,2,2,2
\"ZZ\",2
\"AA\",3,3,3,3
\"CC\",\"ad\",2,2,2,2,2

With read.csv


      
      
        
          2条回答        

        
                    
            
            
                         
                
              
              
                
                   执念已碎
                                             
                
                
                (楼主)
            
              
              
                2020-12-01 17:29
              

            
            
                        
Major update

It looks like development plans for fread changed and fread has now gained a fill argument.

Using the same sample data from the end of this answer, here's what I get:

library(data.table)
packageVersion("data.table")
# [1] ‘1.9.7’
fread(x, fill = TRUE)
#    V1 V2 V3 V4 V5 V6 V7
# 1: AA  3  3  3  3 NA NA
# 2: CC ad  2  2  2  2  2
# 3: ZZ  2 NA NA NA NA NA
# 4: AA  3  3  3  3 NA NA
# 5: CC ad  2  2  2  2  2


Install the development version of "data.table" with:

install.packages("data.table", 
                 repos = "https://Rdatatable.github.io/data.table", 
                 type = "source")


Original answer

This doesn't answer your question about fread: That question has already been addressed by @Matt. 

It does, however, give you an alternative to consider that should give you good speed improvements over base R's read.csv.

Unlike fread, you will have to help these functions out a little by providing them with some information about the data you are trying to read.

You can use the input.file function from "iotools". By specifying the column types, you can tell the formatter function how many columns to expect. 

library(iotools)
input.file(x, formatter = dstrsplit, sep = ",",
           col_types = rep("character", max(count.fields(x, ","))))


Sample data

x <- tempfile()
myvec <- c('"AA",3,3,3,3', '"CC","ad",2,2,2,2,2', '"ZZ",2', '"AA",3,3,3,3', '"CC","ad",2,2,2,2,2')
cat(myvec, file = x, sep = "\n")

## Uncomment for bigger sample data
## cat(rep(myvec, 200000), file = x, sep = "\n")

    
             
                                                        
            
            
              
                
                0
              
                   
                
               讨论(0)
              
                                                  
              
              
                          
             
       
          
              
                                       
     查看其它2个回答


            
                         
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
                              			
        
        
        
          
            
            
              
              
            
    


                                 
              
            
                          
    

        
         
                验证码
                
                  
                
                
                   看不清?
                
              
                                  
                    
   
                 
             
              提交回复