Trying to merge multiple csv files in R

前端未结

关注

 6  1097

I\'m attempting to merge multiple csv files using R. all of the CSV files have the same fields and are all a shared folder only containing these CSV files. I\'ve attempted

相关标签:

6条回答

一生所求

2020-12-03 22:05

For a shorter, faster solution

library(dplyr)
library(readr)
df <- list.files(path="yourpath", full.names = TRUE) %>% 
  lapply(read_csv) %>% 
  bind_rows

0 讨论(0)

无人共我

2020-12-03 22:19

I tried working with the same function but included the all=TRUE in the merge function and worked just fine.

The code I used is as follows:

multmerge = function(mypath){
  filenames=list.files(path=mypath, full.names=TRUE)
  datalist = lapply(filenames, function(x){read.csv(file=x,header=T)})
  Reduce(function(x,y) {merge(x,y,all = TRUE)}, datalist)
}

full_data = multmerge("path_name for your csv folder")

Hope this helps. Cheers!

0 讨论(0)

时光说笑

2020-12-03 22:20

Another option that has proved to work for my setup:

multmerge = function(path){
  filenames=list.files(path=path, full.names=TRUE)
  rbindlist(lapply(filenames, fread))
}


path <- "Dropbox/rstudio-share/dataset/MB"
DF <- multmerge(path)

If you need a much granular control of your CSV file during the loading process you can change the fread by a function like so:

multmerge = function(path){
  filenames=list.files(path=path, full.names=TRUE)
  rbindlist(lapply(filenames, function(x){read.csv(x, stringsAsFactors = F, sep=';')}))
}

0 讨论(0)

广开言路

2020-12-03 22:21

Let me give you the best I have ever had:

library(pacman)
p_load(doParallel,data.table,dplyr,stringr,fst)

# get the file name
dir() %>% str_subset("\\.csv$") -> fn

# use parallel setting
(cl = detectCores() %>% 
  makeCluster()) %>% 
  registerDoParallel()

# read and bind
system.time({
  big_df = foreach(i = fn,
                    .packages = "data.table") %dopar% {
                      fread(i,colClasses = "chracter")
                    } %>% 
    rbindlist(fill = T)
})

# end of parallel work
stopImplicitCluster(cl)

This should be faster as long as you have more cores in your computer.If you are dealing with big data, it is preferred.

0 讨论(0)

佛祖请我去吃肉

2020-12-03 22:22

If all your csv files have exactly the same fields (column names) and you want simply to combine them vertically, you should use rbind instead of merge:

> a
             A         B
[1,]  2.471202 38.949232
[2,] 16.935362  6.343694
> b
            A          B
[1,] 0.704630  0.1132538
[2,] 4.477572 11.8869057
> rbind(a, b)
             A          B
[1,]  2.471202 38.9492316
[2,] 16.935362  6.3436939
[3,]  0.704630  0.1132538
[4,]  4.477572 11.8869057

0 讨论(0)

-上瘾入骨i

2020-12-03 22:27

Your code worked for me, but you need change header = True to header = TRUE.

0 讨论(0)
发布评论:

提交评论
- 加载中...