How to bind data.table without increasing the memory consumption?

后端 未结 3 490
广开言路
广开言路 2020-12-21 08:28

I have few huge datatable dt_1, dt_2, ..., dt_N with same cols. I want to bind them together into a single datatable. If I use

dt          


        
3条回答
  •  温柔的废话
    2020-12-21 09:22

    Other approach, using a temporary file to 'bind':

    nobs=10000
    d1 <- d2 <- d3 <-  data.table(a=rnorm(nobs),b=rnorm(nobs))
    ll<-c('d1','d2','d3')
    tmp<-tempfile()
    
    # Write all, writing header only for the first one
    for(i in seq_along(ll)) {
      write.table(get(ll[i]),tmp,append=(i!=1),row.names=FALSE,col.names=(i==1))
    }
    
    # 'Cleanup' the original objects from memory (should be done by the gc if needed when loading the file
    rm(list=ll)
    
    # Read the file in the new object
    dt<-fread(tmp)
    
    # Remove the file
    unlink(tmp)
    

    Obviously slower than the rbind method, but if you have memory contention, this won't be slower than requiring the system to swap out memory pages.

    Of course if your orignal objects are loaded from file at first, prefer concatenating the files before loading in R with another tool most aimed at working with files (cat, awk, etc.)

提交回复
热议问题