可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效,请关闭广告屏蔽插件后再试):
问题:
I have a CSV file (24.1 MB) that I cannot fully read into my R session. When I open the file in a spreadsheet program I can see 112,544 rows. When I read it into R with read.csv
I only get 56,952 rows and this warning:
cit
I can read the whole file into R with readLines
:
rl
But I can't get this back into R as a table (via read.csv
):
write.table(rl, "rl.txt", quote = FALSE, row.names = FALSE) rl_in
How can I solve or workaround this EOF message (which seems to be more of an error than a warning) to get the entire file into my R
session?
I have similar problems with other methods of reading CSV files:
require(sqldf) cit_sql
Here's my sessionInfo()
R version 3.0.1 (2013-05-16) Platform: x86_64-w64-mingw32/x64 (64-bit) locale: [1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United States.1252 [3] LC_MONETARY=English_United States.1252 LC_NUMERIC=C [5] LC_TIME=English_United States.1252 attached base packages: [1] tools tcltk stats graphics grDevices utils datasets methods base other attached packages: [1] ff_2.2-11 bit_1.1-10 data.table_1.8.8 sqldf_0.4-6.4 [5] RSQLite.extfuns_0.0.1 RSQLite_0.11.4 chron_2.3-43 gsubfn_0.6-5 [9] proto_0.3-10 DBI_0.2-7
回答1:
You need to disable quoting.
cit
I think is because of this kind of lines (check "Thorn" and "Minus")
readLines("citations.CSV")[82] [1] "10.2307/3642839,10.2307/3642839\t,\"Thorn\" and \"Minus\" in Hieroglyphic Luvian Orthography\t,H. Craig Melchert\t,Anatolian Studies\t,38\t,\t,1988-01-01T00:00:00Z\t,pp. 29-42\t,British Institute at Ankara\t,fla\t,\t,"
回答2:
I'm a new-ish R user and thought I'd post this in case it helps anyone else. I was trying to read in data from a text file (separated with commas) that included a few Spanish characters and it took me forever to figure it out. I knew I needed to use UTF-8 encoding, set the header arg to TRUE, and that I need to set the sep arguemnt to ",", but then I still got hang ups. After reading this post I tried setting the fill arg to TRUE, but then got the same "EOF within quoted string" which I was able to fix in the same manner as above. My successful read.table looks like this:
target
The result has Spanish language characters and same dims I had originally, so I'm calling it a success! Thanks all!
回答3:
In the R help section, as pointed out above, just disabling quoting altogether, by simply adding:
quote = ""
to the read.csv() worked for me.
The error, "EOF within quoted string", occurred with:
> iproscan.53A.neg = read.csv("interproscan.53A.neg.n.csv", + colClasses=c(pb.id = "character", + genLoc = "character", + icode = "character", + length = "character", + proteinDB = "character", + protein.id = "character", + prot.desc = "character", + start = "character", + end = "character", + evalue = "character", + tchar = "character", + date = "character", + ipro.id = "character", + prot.name = "character", + go.cat = "character", + reactome.id= "character"), + as.is=T,header=F) Warning message: In scan(file, what, nmax, sep, dec, quote, skip, nlines, na.strings, : EOF within quoted string > dim(iproscan.53A.neg) [1] 69383 16
And the file read in was missing 6,619 lines. But by disabling quoting
> iproscan.53A.neg = read.csv("interproscan.53A.neg.n.csv", + colClasses=c(pb.id = "character", + genLoc = "character", + icode = "character", + length = "character", + proteinDB = "character", + protein.id = "character", + prot.desc = "character", + start = "character", + end = "character", + evalue = "character", + tchar = "character", + date = "character", + ipro.id = "character", + prot.name = "character", + go.cat = "character", + reactome.id= "character"), + as.is=T,header=F,**quote=""**) > > dim(iproscan.53A.neg) [1] 76002 16
Worked without error and all lines were successfully read in.
回答4:
I also ran into this problem, and was able to work around a similar EOF error using:
read.table("....csv", sep=",", ...)
Notice that the separator parameter is defined within the more general read.table()
.
回答5:
Actually, using read.csv()
to read a file with text content is not a good idea, disable the quote as set quote="" is only a temporary solution, it only worked with Separate quotation marks. There are other reasons would cause the warning, for example, some special characters.
so with these special character cases, the permanent solution is to check your file to find out what those special characters are and use regular expression to eliminate them.
Have you ever think of installing the package {data.table}
and use fread()
to read the file. it is much faster and would not bother you with this EOF warning. note that you the file it read in is not a class data.frame, data.table
has many good features, but you could change it using as.data.frame()
if needed.
回答6:
I had the similar problem: EOF -warning and only part of data was loading with read.csv(). I tried the quotes="", but it only removed the EOF -warning.
But looking at the first row that was not loading, I found that there was a special character, an arrow → (hexadecimal value 0x1A) in one of the cells. After deleting the arrow I got the data to load normally.