R data.table fread command : how to read large files with irregular separators?

后端 未结 5 1389
迷失自我
迷失自我 2020-12-06 20:49

I have to work with a collection of 120 files of ~2 GB (525600 lines x 302 columns). The goal is to make some statistics and put the results in a clean SQLite database.

5条回答
  •  [愿得一人]
    2020-12-06 20:51

    If peak memory is not an issue or you can stream it in chunks that are manageable, the following gsub()/fread() hybrid should work, converting all continuous space characters to a single delimiter of your choosing (e.g. "\t"), prior to parsing by fread():

    fread_blank = function(inputFile, spaceReplace = "\t", n = -1, ...){
      fread(
        input = paste0(
          gsub(pattern = "[[:space:]]+",
               replacement = spaceReplace,
               x = readLines(inputFile, n = n)),
          collapse = "\n"),
        ...)
    }
    

    I must agree with others that space-delimited files is not ideal choice, but I come across them pretty often whether I like it or not.

提交回复
热议问题