问题
I have a .csv file with account codes in the form of 00xxxxx and I need them to stay that way for use with other programs which use the account codes in this format. I was just working on an R script to reconcile account charges on Friday and swore that as.is = T
was working for me. Now, it doesn't seem to be. Here's some example data:
test <- data.frame(col1 = c("apple", "banana", "carrot"),
col2 = c(100, 200, 300),
col3 = c("00234", "00345", "00456"))
My write.table
strategy:
write.table(test, file = "C:/path/test.csv", quote = T,
sep=",", row.names = F)
Remove the old data.frame
and re-read:
rm(test)
test <- read.csv("C:/path/test.csv")
test
col1 col2 col3
1 apple 100 234
2 banana 200 345
3 carrot 300 456
In case it's not clear, it should look like the original data.frame
we created:
test
col1 col2 col3
1 apple 100 00234
2 banana 200 00345
3 carrot 300 00456
I also tried the following, after perusing the available read.table
options, with the results all the same as above:
test <- read.csv("C:/path/test.csv", quote = '"')
test <- read.csv("C:/path/test.csv", as.is = T)
test <- read.csv("C:/path/test.csv", as.is = T, quote = '"')
StringsAsFactors
didn't seem to be relevant in this case (and sounds like as.is
will do the same thing.
When I open the file in Emacs, col3
is, indeed, surrounded by quotes, so I'd expect it to be treated like text instead of converted to a number:

Most of the other questions are simply about not treating things like factors, or getting numbers not to be recognized as characters, usually the result of an overlooked character string in that column.
I see I can pursue the colClasses
argument from questions like this one, but I'd prefer not to; my "colClasses" are built into the data :) Quoted = character, not quoted = numeric.
回答1:
I expect there's a better method, but one option would be to use quote=""
test <- read.csv("C:/path/test.csv", as.is = TRUE, quote = "")
This would make the quotes part of the values, giving you:
test
#col1 col2 col3
#1 apple 100 "00234"
#2 banana 200 "00345"
#3 carrot 300 "00456"
You could then either keep them in that format, or use something like gsub to remove them:
test$col3 <- gsub('"', '', test$col3)
test
#col1 col2 col3
#1 apple 100 00234
#2 banana 200 00345
#3 carrot 300 00456
You can use some kind of apply-type function to do the gsub on the whole data frame at once:
test <- as.data.frame(sapply(test,gsub,pattern='"',replacement=""))
sapply code taken from: R - how to replace parts of variable strings within data frame
Obviously, this method will only be useful to you if you don't need the quotes elsewhere for other reasons.
回答2:
After pinging a couple of friends who are R users, they both suggested using colClasses
. I was relieved to find that I didn't need to specify each class, since my data is ~25 columns. SO confirmed this (once I knew what I was looking for) in another question.
test <- read.csv("C:/path/test.csv", colClasses = c(col3 = "character"))
test
col1 col2 col3
1 apple 100 00234
2 banana 200 00345
3 carrot 300 00456
As it currently stands, the question is a duplicate of the other with respect to the solution. The difference is that I was looking for ways other than colClasses
(since as.is
sounds like such a likely candidate), while that question was about how to use colClasses
.
I'll reiterate that I don't actually like this solution, even thought it's pretty simple. Quotes denote text fields in a .csv, and they don't seem to be respected in this case. The LibreOffice .csv import has a checkbox for "Treat quoted fields as text," which I'd think is analogous to as.is = T
in R. Obviously not! #end_rant
回答3:
I have this issue too. Of course you can manually specify colClasses, but why is this necessary when data is quoted? I agree with the OP's 'rant' in the answer posted to his own question:
Quotes denote text fields in a .csv, and they don't seem to be respected in this case.
Anyway, I elected to use data.table's fread()
which doesn't have this issue. Still annoying behaviour for read.csv
though.
# here's a data frame with chr and int columns
my_df <- data.frame(chars=letters[1:5],
nums=1:5,
txt_nums=sprintf('%02d', 1:5),
stringsAsFactors=F)
# all looks as it should
lapply(my_df, class)
# $chars
# [1] "character"
#
# $nums
# [1] "integer"
#
# $txt_nums
# [1] "character"
But now, write to csv, read it back in, and the third column is coerced to int!
# quote=T redundant since that's the default, but just to be sure
write.csv(my_df, 'my_df.csv', row.names=F, quote=T)
my_df2 <- read.csv('my_df.csv')
lapply(my_df2, class)
# even with as.is=TRUE, same issue
my_df2 <- read.csv('my_df.csv', as.is=T)
lapply(my_df2, class)
# data.table's fread doesn't have this issue, at least
library(data.table)
my_dt <- fread('my_df.csv')
lapply(my_dt, class)
来源:https://stackoverflow.com/questions/22923756/r-respect-quotes-around-numbers-treat-as-character-with-read-csv