Trying to merge multiple csv files in R

前端 未结 6 1097

I\'m attempting to merge multiple csv files using R. all of the CSV files have the same fields and are all a shared folder only containing these CSV files. I\'ve attempted

相关标签:
6条回答
  • 2020-12-03 22:05

    For a shorter, faster solution

    library(dplyr)
    library(readr)
    df <- list.files(path="yourpath", full.names = TRUE) %>% 
      lapply(read_csv) %>% 
      bind_rows 
    
    0 讨论(0)
  • 2020-12-03 22:19

    I tried working with the same function but included the all=TRUE in the merge function and worked just fine.

    The code I used is as follows:

    multmerge = function(mypath){
      filenames=list.files(path=mypath, full.names=TRUE)
      datalist = lapply(filenames, function(x){read.csv(file=x,header=T)})
      Reduce(function(x,y) {merge(x,y,all = TRUE)}, datalist)
    }
    
    full_data = multmerge("path_name for your csv folder")
    

    Hope this helps. Cheers!

    0 讨论(0)
  • 2020-12-03 22:20

    Another option that has proved to work for my setup:

    multmerge = function(path){
      filenames=list.files(path=path, full.names=TRUE)
      rbindlist(lapply(filenames, fread))
    }
    
    
    path <- "Dropbox/rstudio-share/dataset/MB"
    DF <- multmerge(path)
    

    If you need a much granular control of your CSV file during the loading process you can change the fread by a function like so:

    multmerge = function(path){
      filenames=list.files(path=path, full.names=TRUE)
      rbindlist(lapply(filenames, function(x){read.csv(x, stringsAsFactors = F, sep=';')}))
    }
    
    0 讨论(0)
  • 2020-12-03 22:21

    Let me give you the best I have ever had:

    library(pacman)
    p_load(doParallel,data.table,dplyr,stringr,fst)
    
    # get the file name
    dir() %>% str_subset("\\.csv$") -> fn
    
    # use parallel setting
    (cl = detectCores() %>% 
      makeCluster()) %>% 
      registerDoParallel()
    
    # read and bind
    system.time({
      big_df = foreach(i = fn,
                        .packages = "data.table") %dopar% {
                          fread(i,colClasses = "chracter")
                        } %>% 
        rbindlist(fill = T)
    })
    
    # end of parallel work
    stopImplicitCluster(cl)
    

    This should be faster as long as you have more cores in your computer.If you are dealing with big data, it is preferred.

    0 讨论(0)
  • 2020-12-03 22:22

    If all your csv files have exactly the same fields (column names) and you want simply to combine them vertically, you should use rbind instead of merge:

    > a
                 A         B
    [1,]  2.471202 38.949232
    [2,] 16.935362  6.343694
    > b
                A          B
    [1,] 0.704630  0.1132538
    [2,] 4.477572 11.8869057
    > rbind(a, b)
                 A          B
    [1,]  2.471202 38.9492316
    [2,] 16.935362  6.3436939
    [3,]  0.704630  0.1132538
    [4,]  4.477572 11.8869057
    
    0 讨论(0)
  • 2020-12-03 22:27

    Your code worked for me, but you need change header = True to header = TRUE.

    0 讨论(0)
提交回复
热议问题