Trouble converting long list of data.frames (~1 million) to single data.frame using do.call and ldply

前端 未结 4 1531
忘了有多久
忘了有多久 2020-12-02 19:28

I know there are many questions here in SO about ways to convert a list of data.frames to a single data.frame using do.call or ldply, but this questions is about understandi

4条回答
  •  轻奢々
    轻奢々 (楼主)
    2020-12-02 19:36

    Your observation that the time taken increases exponentially with the number of data.frames suggests that breaking the rbinding into two stages could speed things up.

    This simple experiment seems to confirm that that's a very fruitful path to take:

    ## Make a list of 50,000 data.frames
    X <- replicate(50000, data.frame(a=rnorm(5), b=1:5), simplify=FALSE)
    
    ## First, rbind together all 50,000 data.frames in a single step
    system.time({
        X1 <- do.call(rbind, X)
    })
    #    user  system elapsed 
    # 137.08   57.98  200.08 
    
    
    ## Doing it in two stages cuts the processing time by >95%
    ##   - In Stage 1, 100 groups of 500 data.frames are rbind'ed together
    ##   - In Stage 2, the resultant 100 data.frames are rbind'ed
    system.time({
        X2 <- lapply(1:100, function(i) do.call(rbind, X[((i*500)-499):(i*500)]))
        X3 <- do.call(rbind, X2)
    }) 
    #    user  system elapsed 
    #    6.14    0.05    6.21 
    
    
    ## Checking that the results are the same
    identical(X1, X3)
    # [1] TRUE
    

提交回复
热议问题