I am attempting unsuccessfully to read a *.csv file containing hidden or invisible characters. The file contents are shown here:
my.data2 <- read.table(text
Here's a solution using scan
to read the data, matrix
to structure it, and data.frame
to make it into a data frame:
readF <- function(path, nfields=4){
m = matrix(
gsub(",","",scan(path,what=rep("",nfields))),
ncol=nfields,byrow=TRUE)
d = data.frame(m[-1,])
names(d)=m[1,]
d
}
So first check the file duplicates your problem :
> read.csv("./invisible.delimiter2.csv")
Common.name Scientific.name Stuff1 Stuff2
1 Greylag.Goose Anser.anser
2 AAC rr
3 Snow.Goose
4 Anser.caerulescens
5 AAC rr
6 Greater.Canada.Goose Branta.canadensis AAC rr
7 Barnacle.Goose Branta.leucopsis
8 AAC rr
9 Brent.Goose Branta.bernicla
10 AAC rr
and then see if my function solves it:
> readF("./invisible.delimiter2.csv")
Read 24 items
Common.name Scientific.name Stuff1 Stuff2
1 Greylag.Goose Anser.anser AAC rr
2 Snow.Goose Anser.caerulescens AAC rr
3 Greater.Canada.Goose Branta.canadensis AAC rr
4 Barnacle.Goose Branta.leucopsis AAC rr
5 Brent.Goose Branta.bernicla AAC rr
Feel free to pick the function apart to see how it works.
I suspect the source of the problem is that the ^M is in the field data, and because you're fields aren't quoted then R can't tell if its a real line end or one in a field. There's some notes about embedded newlines in quoted fields in the documentation for read.csv
etc.