Read csv file with hidden or invisible character ^M

前端未结

关注

 3  998

死守一世寂寞 2021-01-25 08:48

I am attempting unsuccessfully to read a *.csv file containing hidden or invisible characters. The file contents are shown here:

my.data2 <- read.table(text


      
      
        
          3条回答        

        
                    
            
            
                         
                
              
              
                
                   离开以前
                                             
                
                
                (楼主)
            
              
              
                2021-01-25 09:43
              

            
            
                        
Here's a solution using scan to read the data, matrix to structure it, and data.frame to make it into a data frame:

readF <- function(path, nfields=4){    
    m = matrix(
          gsub(",","",scan(path,what=rep("",nfields))),
              ncol=nfields,byrow=TRUE)
    d = data.frame(m[-1,])
    names(d)=m[1,]
    d
}


So first check the file duplicates your problem :

> read.csv("./invisible.delimiter2.csv")
            Common.name    Scientific.name Stuff1 Stuff2
1         Greylag.Goose        Anser.anser              
2                   AAC                 rr              
3            Snow.Goose                                 
4    Anser.caerulescens                                 
5                   AAC                 rr              
6  Greater.Canada.Goose  Branta.canadensis    AAC     rr
7        Barnacle.Goose   Branta.leucopsis              
8                   AAC                 rr              
9           Brent.Goose    Branta.bernicla              
10                  AAC                 rr        


and then see if my function solves it:

> readF("./invisible.delimiter2.csv")
Read 24 items
           Common.name    Scientific.name Stuff1 Stuff2
1        Greylag.Goose        Anser.anser    AAC     rr
2           Snow.Goose Anser.caerulescens    AAC     rr
3 Greater.Canada.Goose  Branta.canadensis    AAC     rr
4       Barnacle.Goose   Branta.leucopsis    AAC     rr
5          Brent.Goose    Branta.bernicla    AAC     rr


Feel free to pick the function apart to see how it works.

I suspect the source of the problem is that the ^M is in the field data, and because you're fields aren't quoted then R can't tell if its a real line end or one in a field. There's some notes about embedded newlines in quoted fields in the documentation for read.csv etc.
    
             
                                                        
            
            
              
                
                0
              
                   
                
               讨论(0)
              
                                                  
              
              
                          
             
       
          
              
                                       
     查看其它3个回答


            
                         
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
                              			
        
        
        
          
            
            
              
              
            
    


                                 
              
            
                          
    

        
         
                验证码
                
                  
                
                
                   看不清?
                
              
                                  
                    
   
                 
             
              提交回复