问题
I have a "csv" text file where each field is separated by \t&%$#
which I'm now trying to import into R.
The sep=
argument of read.table()
instists on a single character. Is there a quick way to directly import this file?
Some of the data fields are user-submitted text which contain tabs, quotes, and other messy stuff, so changing the delimiter to something simpler seems like it could create other problems.
回答1:
The following code will be able to handle multiple separator chars:
#fileName <- file name with fully qualified path
#separators <- each of them separated by '|'
read <- function(fileName, separators) {
data <- readLines(con <- file(fileName))
close(con)
records <- sapply(data, strsplit, split=separators)
dataFrame <- data.frame(t(sapply(records,c)))
rownames(dataFrame) <- 1: nrow(dataFrame)
return(as.data.frame(dataFrame,stringsAsFactors = FALSE))
}
回答2:
As explained in this post, it is not possible in R without resorting to string parsing. You can pre-parse your file in another language (Awk, Perl, Python etc.) or read it line-by-line and parse the resulting strings in R.
来源:https://stackoverflow.com/questions/18186357/importing-csv-file-with-multiple-character-separator-to-r