I know there are many questions here in SO about ways to convert a list of data.frames to a single data.frame using do.call or ldply, but this questions is about understandi
You have a list of data.frames that each have a single row. If it is possible to convert each of those to a vector, I think that would speed things up a lot.
However, assuming that they need to be data.frames, I'll create a function with code borrowed from Dominik's answer at Can rbind be parallelized in R?
do.call.rbind <- function (lst) {
while (length(lst) > 1) {
idxlst <- seq(from = 1, to = length(lst), by = 2)
lst <- lapply(idxlst, function(i) {
if (i == length(lst)) {
return(lst[[i]])
}
return(rbind(lst[[i]], lst[[i + 1]]))
})
}
lst[[1]]
}
I have been using this function for several months, and have found it to be faster and use less memory than do.call(rbind, ...) [the disclaimer is that I've pretty much only used it on xts objects]
The more rows that each data.frame has, and the more elements that the list has, the more beneficial this function will be.
If you have a list of 100,000 numeric vectors, do.call(rbind, ...) will be better. If you have list of length one billion, this will be better.
> df <- lapply(1:10000, function(x) data.frame(x = sample(21, 21)))
> library(rbenchmark)
> benchmark(a=do.call(rbind, df), b=do.call.rbind(df))
test replications elapsed relative user.self sys.self user.child sys.child
1 a 100 327.728 1.755965 248.620 79.099 0 0
2 b 100 186.637 1.000000 181.874 4.751 0 0
The relative speed up will be exponentially better as you increase the length of the list.