Fastest way to read in 100,000 .dat.gz files

后端 未结 3 1425
耶瑟儿~
耶瑟儿~ 2020-12-05 11:45

I have a few hundred thousand very small .dat.gz files that I want to read into R in the most efficient way possible. I read in the file and then immediately ag

3条回答
  •  自闭症患者
    2020-12-05 12:15

    I'm sort of surprised that this actually worked. Hopefully it works for your case. I'm quite curious to know how speed compares to reading in compressed data from disk directly from R (albeit with a penalty for non-vectorization) instead.

    tblNames = fread('cat *dat.gz | gunzip | head -n 1')[, colnames(.SD)]
    tbl = fread('cat *dat.gz | gunzip | grep -v "^Day"')
    setnames(tbl, tblNames)
    tbl
    

提交回复
热议问题