I\'m attempting to merge multiple csv files using R. all of the CSV files have the same fields and are all a shared folder only containing these CSV files. I\'ve attempted
Let me give you the best I have ever had:
library(pacman)
p_load(doParallel,data.table,dplyr,stringr,fst)
# get the file name
dir() %>% str_subset("\\.csv$") -> fn
# use parallel setting
(cl = detectCores() %>%
makeCluster()) %>%
registerDoParallel()
# read and bind
system.time({
big_df = foreach(i = fn,
.packages = "data.table") %dopar% {
fread(i,colClasses = "chracter")
} %>%
rbindlist(fill = T)
})
# end of parallel work
stopImplicitCluster(cl)
This should be faster as long as you have more cores in your computer.If you are dealing with big data, it is preferred.