Read csv file with hidden or invisible character ^M

前端 未结 3 998
死守一世寂寞
死守一世寂寞 2021-01-25 08:48

I am attempting unsuccessfully to read a *.csv file containing hidden or invisible characters. The file contents are shown here:

my.data2 <- read.table(text          


        
3条回答
  •  离开以前
    2021-01-25 09:43

    Here's a solution using scan to read the data, matrix to structure it, and data.frame to make it into a data frame:

    readF <- function(path, nfields=4){    
        m = matrix(
              gsub(",","",scan(path,what=rep("",nfields))),
                  ncol=nfields,byrow=TRUE)
        d = data.frame(m[-1,])
        names(d)=m[1,]
        d
    }
    

    So first check the file duplicates your problem :

    > read.csv("./invisible.delimiter2.csv")
                Common.name    Scientific.name Stuff1 Stuff2
    1         Greylag.Goose        Anser.anser              
    2                   AAC                 rr              
    3            Snow.Goose                                 
    4    Anser.caerulescens                                 
    5                   AAC                 rr              
    6  Greater.Canada.Goose  Branta.canadensis    AAC     rr
    7        Barnacle.Goose   Branta.leucopsis              
    8                   AAC                 rr              
    9           Brent.Goose    Branta.bernicla              
    10                  AAC                 rr        
    

    and then see if my function solves it:

    > readF("./invisible.delimiter2.csv")
    Read 24 items
               Common.name    Scientific.name Stuff1 Stuff2
    1        Greylag.Goose        Anser.anser    AAC     rr
    2           Snow.Goose Anser.caerulescens    AAC     rr
    3 Greater.Canada.Goose  Branta.canadensis    AAC     rr
    4       Barnacle.Goose   Branta.leucopsis    AAC     rr
    5          Brent.Goose    Branta.bernicla    AAC     rr
    

    Feel free to pick the function apart to see how it works.

    I suspect the source of the problem is that the ^M is in the field data, and because you're fields aren't quoted then R can't tell if its a real line end or one in a field. There's some notes about embedded newlines in quoted fields in the documentation for read.csv etc.

提交回复
热议问题