How to merge csv files from nested folders in R

前端 未结 3 1296
陌清茗
陌清茗 2020-12-21 10:52

I have a large collection of csv files that are in different folders and in folders within folders that I need to merge into one file. It would be easy if they were all in o

相关标签:
3条回答
  • 2020-12-21 11:25

    Here is a solution with dplyr

    # get list of files ending in csv in directory root
    dir(root, pattern='csv$', recursive = TRUE, full.names = TRUE) %>%
      # read files into data frames
      lapply(FUN = read.csv) %>%
      # bind all data frames into a single data frame
      rbind_all %>%
      # write into a single csv file
      write.csv("all.csv")
    
    0 讨论(0)
  • 2020-12-21 11:34

    This solution has the assumption that all *.csv files have the same structure.

    (Untested)

    fileList <- list.files(
      pattern="*.csv$",
      recursive=TRUE,
      full.name=TRUE,
      )
    
    completeCSV <- data.frame()
    
    
    for(file in fileList) {
      print(file) # for debug: print current file
      if (nrow(completeCSV) == 0) {
        completeCSV <- read.csv(file)
      } else {
        curDF <- read.csv(file) # could also be read.csv2()
        rbind(completeCSV, curDF)
      }
    }
    
    write.csv(completeCSV) # could also be write.csv2()
    
    0 讨论(0)
  • 2020-12-21 11:47

    You can use dir() with recursive set to TRUE to list all files in the folder tree, and you can use pattern to define a regular expression to filter the .csv files. An example:

    csv_files <- dir(pattern='.*[.]csv', recursive = T)
    

    or even better and simpler (thanks to speendo for his comment):

    csv_files <- dir(pattern='*.csv$', recursive = T)
    

    The explanation.

    • pattern='*.csv$: The pattern argument must be a regular expression that filters the file names. This RegEx filters out the file names that end with .csv.

      If you want to filter that starts with data, you should try a pattern like this: pattern='^data.*.csv$'

    • recursive=T: Forces dir() to traverse recursively through all folders below the working directory.

    After you get the file list, and assuming all of them have the same structure (i.e. all the files have the same columns), you can merge them with read.csv() and rbind():

    for(i in 1:length(csv_files)) {
      if(i == 1)
        df <- read.csv(csv_files[i])
      else
        df <- rdbind(df, read.csv(csv_files[i]))
    }
    

    Ramnath suggests in his comment a faster way to merge the .csv files (again, assuming all of them have the same structure):

    library(dplyr)
    df <- rbind_all(lapply(csv_files, read_csv))
    
    0 讨论(0)
提交回复
热议问题