问题
I have many .csv
files, containing variables for the same "population", keyed by surname
and first.name
.
So every csv
has three columns: first name, surname and the variable of interest.
I load each one of them in separate data tables which then I want to merge them.
library(data.table)
surnames <- c('A', 'B')
first.names <- c('C', 'D')
weights <- c(80, 90)
heights <- c(180, 190)
write.csv(data.frame(surname = surnames, first.name = first.names,
height = heights), file = 'variable-height.csv')
write.csv(data.frame(surname = surnames, first.name = first.names,
weight = weights), file = 'variable-weight.csv')
variables.to.load <- c('height', 'weight')
for (i in variables.to.load) {
assign(paste0('DT.', i), fread(paste0('variable-', i, '.csv')))
print(dim(eval(parse(text = paste0('DT.', i)))))
setkey(eval(parse(text = paste0('DT.', i))), surname, first.name)
}
loads them and sets the keys correctly. What I am missing, though, is the automatic merging.
DT.merged <- Reduce(merge, list(DT.height, DT.weight))
works, but I want to do it in an automatic way, since the real variables are many more. That is, I want to write the contents of list()
: DT.height
, DT.weight
, etc in an automatic way.
I have tried:
library('stringr')
DT.merged <- Reduce(merge, list(eval(parse(text = str_c(paste0('DT.', variables.to.load), collapse = ', ')))))
with no results.
I do the whole process, because I want to selectively have different variables for my population (which totals to a csv with more than 30GB and around 30 variables). So using fread
on the full csv
to selectively read columns seems rather slow.
回答1:
This should work for your question
DTlist <- lapply(paste0('variable-', variables.to.load, '.csv'),
function(x) {
d <- fread(x)
setkey(d, surname, first.name)
d
}
)
DT.merged <- Reduce(merge, DT)
That being said, as Roland and I allude to in comments, this is unlikely to be the best approach if you have access to a single CSV file with all your desired data.
If you do have access to such a file you'd be better served to use the select
parameter of fread
DT <- fread('master.csv', select=c(variables.to.load))
来源:https://stackoverflow.com/questions/35654357/read-in-and-merge-many-csv-files-into-data-table