Read csv file with hidden or invisible character ^M

前端未结

关注

 3  1003

死守一世寂寞 2021-01-25 08:48

I am attempting unsuccessfully to read a *.csv file containing hidden or invisible characters. The file contents are shown here:

my.data2 <- read.table(text


      
      
        
          3条回答        

        
                    
            
            
                         
                
              
              
                
                   既然无缘
                                             
                
                
                (楼主)
            
              
              
                2021-01-25 09:26
              

            
            
                        
Here is code that can handle white space (i.e., multiple words) within fields:

nfields <- 4

bb <- readLines('c:/users/mmiller21/simple R programs/invisible.delimiter4.csv')
bb

pattern <- "(?<=\\,)(?=)"                  # split on commas
cc <- strsplit(bb, pattern, perl=TRUE)
dd <- unlist(cc)
ee <- dd[dd != ' ' & dd != '' & dd != ','] # remove empty elements
ff <- gsub(",", "", ee)                    # remove commas

m = matrix(ff, ncol=nfields, byrow=TRUE)   # store data in matrix

# returns string w/o leading or trailing whitespace
trim <- function (x) gsub("^\\s+|\\s+$", "", x)
nn <- trim(m)
nn


Here are the contents of the original data set:

Common.name, Scientific.name, Stuff1, Stuff2
Greylag Goose, Anser anser, AAC aa, rr bb
Snow Goose, Anser caerulescens, AAC aa aa, rr bb bb
Greater Canada Goose, Branta canadensis, AAC, rr bb
Barnacle Goose, Branta leucopsis, AAC aa, rr
Brent Goose, Branta bernicla, AAC, rr bb bb bb


I simple removed the dots from the common name and scientific name and added extra text to the third and fourth columns.

Here is the output:

     [,1]                   [,2]                 [,3]        [,4]         
[1,] "Common.name"          "Scientific.name"    "Stuff1"    "Stuff2"     
[2,] "Greylag Goose"        "Anser anser"        "AAC aa"    "rr bb"      
[3,] "Snow Goose"           "Anser caerulescens" "AAC aa aa" "rr bb bb"   
[4,] "Greater Canada Goose" "Branta canadensis"  "AAC"       "rr bb"      
[5,] "Barnacle Goose"       "Branta leucopsis"   "AAC aa"    "rr"         
[6,] "Brent Goose"          "Branta bernicla"    "AAC"       "rr bb bb bb"

    
             
                                                        
            
            
              
                
                0
              
                   
                
               讨论(0)
              
                                                  
              
              
                          
             
       
          
              
                                       
     查看其它3个回答


            
                         
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
                              			
        
        
        
          
            
            
              
              
            
    


                                 
              
            
                          
    

        
         
                验证码
                
                  
                
                
                   看不清?
                
              
                                  
                    
   
                 
             
              提交回复