Read in and merge many CSV files into data.table

China☆狼群 提交于 2019-12-24 13:34:11

问题


I have many .csv files, containing variables for the same "population", keyed by surname and first.name. So every csv has three columns: first name, surname and the variable of interest. I load each one of them in separate data tables which then I want to merge them.

library(data.table)
surnames <- c('A', 'B')
first.names <- c('C', 'D')
weights <- c(80, 90)
heights <- c(180, 190)

write.csv(data.frame(surname = surnames, first.name = first.names, 
                     height = heights), file = 'variable-height.csv')
write.csv(data.frame(surname = surnames, first.name = first.names,  
                     weight = weights), file = 'variable-weight.csv')

variables.to.load <- c('height', 'weight')
for (i in variables.to.load) {
assign(paste0('DT.', i), fread(paste0('variable-', i, '.csv')))
print(dim(eval(parse(text = paste0('DT.', i)))))
setkey(eval(parse(text = paste0('DT.', i))), surname, first.name)
}

loads them and sets the keys correctly. What I am missing, though, is the automatic merging.

DT.merged <- Reduce(merge, list(DT.height, DT.weight))

works, but I want to do it in an automatic way, since the real variables are many more. That is, I want to write the contents of list(): DT.height, DT.weight, etc in an automatic way.

I have tried:

library('stringr')
DT.merged <- Reduce(merge, list(eval(parse(text = str_c(paste0('DT.', variables.to.load), collapse = ', ')))))

with no results.

I do the whole process, because I want to selectively have different variables for my population (which totals to a csv with more than 30GB and around 30 variables). So using fread on the full csv to selectively read columns seems rather slow.


回答1:


This should work for your question

DTlist <- lapply(paste0('variable-', variables.to.load, '.csv'), 
    function(x) {
       d <- fread(x) 
       setkey(d, surname, first.name)
       d
     }
   )
DT.merged <- Reduce(merge, DT)

That being said, as Roland and I allude to in comments, this is unlikely to be the best approach if you have access to a single CSV file with all your desired data.

If you do have access to such a file you'd be better served to use the select parameter of fread

DT <- fread('master.csv', select=c(variables.to.load))


来源:https://stackoverflow.com/questions/35654357/read-in-and-merge-many-csv-files-into-data-table

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!