Trouble converting long list of data.frames (~1 million) to single data.frame using do.call and ldply

前端未结

关注

 4  1531

忘了有多久 2020-12-02 19:28

I know there are many questions here in SO about ways to convert a list of data.frames to a single data.frame using do.call or ldply, but this questions is about understandi

4条回答

轻奢々 (楼主)

2020-12-02 19:36

Your observation that the time taken increases exponentially with the number of data.frames suggests that breaking the rbinding into two stages could speed things up.

This simple experiment seems to confirm that that's a very fruitful path to take:

## Make a list of 50,000 data.frames
X <- replicate(50000, data.frame(a=rnorm(5), b=1:5), simplify=FALSE)

## First, rbind together all 50,000 data.frames in a single step
system.time({
    X1 <- do.call(rbind, X)
})
#    user  system elapsed 
# 137.08   57.98  200.08 


## Doing it in two stages cuts the processing time by >95%
##   - In Stage 1, 100 groups of 500 data.frames are rbind'ed together
##   - In Stage 2, the resultant 100 data.frames are rbind'ed
system.time({
    X2 <- lapply(1:100, function(i) do.call(rbind, X[((i*500)-499):(i*500)]))
    X3 <- do.call(rbind, X2)
}) 
#    user  system elapsed 
#    6.14    0.05    6.21 


## Checking that the results are the same
identical(X1, X3)
# [1] TRUE

0 讨论(0)

查看其它4个回答