How to bind data.table without increasing the memory consumption?

后端 未结 3 497
广开言路
广开言路 2020-12-21 08:28

I have few huge datatable dt_1, dt_2, ..., dt_N with same cols. I want to bind them together into a single datatable. If I use

dt          


        
3条回答
  •  清酒与你
    2020-12-21 09:20

    You can remove your datatables after you've bound them, the double memory-usage is caused by the new dataframe consisting of copies.

    Illustration:

    #create some data
    nobs=10000
    d1 <- d2 <- d3 <-  data.table(a=rnorm(nobs),b=rnorm(nobs))
    dt <- rbindlist(list(d1,d2,d3))
    

    Then we can look at memory-usage per object source

    sort( sapply(ls(),function(x){object.size(get(x))}))
      nobs     d1     d2     d3     dt 
        48 161232 161232 161232 481232 
    

    If the memory-usage is so large the separate datatables and combined datatable cannot coexist, we can (shocking, but IMHO this case warrants it as there are a small number of datatables and it's easily readable and understandable) a for-loop and get to create our combined datatable and delete the individual ones at the same time:

    mydts <- c("d1","d2","d3") #vector of datatable names
    
    dt<- data.table() #empty datatable to bind objects to
    
    for(d in mydts){
      dt <- rbind(dt, get(d))
      rm(list=d)
      gc() #garbage collection
    }
    

提交回复
热议问题