Fast reading and combining several files using data.table (with fread)

前端 未结 2 965
猫巷女王i
猫巷女王i 2020-12-23 16:55

I have several different txt files with the same structure. Now I want to read them into R using fread, and then union them into a bigger dataset.

## First          


        
相关标签:
2条回答
  • 2020-12-23 17:36

    I've re-written the code to do this way too many times.. Finally rolled it into a handy function, below.

    data.table_fread_mult <- function(filepaths = NULL, dir = NULL, recursive = FALSE, extension = NULL, ...){
      # fread() multiple filepaths and then combine the results into a single data.table
      # This function has two interfaces: either
      # 1) provide `filepaths` as a character vector of filepaths to read or 
      # 2) provide `dir` (and optionally `extension` and `recursive`) to identify the directory to read from
      # ... should be arguments to pass on to fread()
      
      if(!is.null(filepaths) & (!is.null(dir) | !is.null(extension))){
        stop("If `filepaths` is given, `dir` and `extension` should be NULL")
      } else if(is.null(filepaths) & is.null(dir)){
        stop("If `filepaths` is not given, `dir` should be given")
      }
      
      # If filepaths isn't given, build it from dir, recursive, extension
      if(is.null(filepaths)){
        filepaths <- list.files(
          path = dir, 
          full.names = TRUE, 
          recursive = recursive, 
          pattern = paste0(extension, "$")
        )
      }
      
      # Read and combine files
      return(rbindlist(lapply(filepaths, fread, ...), use.names = TRUE))
    }
    
    0 讨论(0)
  • 2020-12-23 17:44

    Use rbindlist() which is designed to rbind a list of data.table's together...

    mylist <- lapply(all.files, readdata)
    mydata <- rbindlist( mylist )
    

    And as @Roland says, do not set the key in each iteration of your function!

    So in summary, this is best :

    l <- lapply(all.files, fread, sep=",")
    dt <- rbindlist( l )
    setkey( dt , ID, date )
    
    0 讨论(0)
提交回复
热议问题