Trouble converting long list of data.frames (~1 million) to single data.frame using do.call and ldply

前端未结

关注

 4  1559

忘了有多久 2020-12-02 19:28

I know there are many questions here in SO about ways to convert a list of data.frames to a single data.frame using do.call or ldply, but this questions is about understandi

4条回答

醉话见心 (楼主)

2020-12-02 19:43
You have a list of data.frames that each have a single row. If it is possible to convert each of those to a vector, I think that would speed things up a lot.

However, assuming that they need to be data.frames, I'll create a function with code borrowed from Dominik's answer at Can rbind be parallelized in R?
```
do.call.rbind <- function (lst) {
  while (length(lst) > 1) {
    idxlst <- seq(from = 1, to = length(lst), by = 2)
    lst <- lapply(idxlst, function(i) {
      if (i == length(lst)) {
        return(lst[[i]])
      }
      return(rbind(lst[[i]], lst[[i + 1]]))
    })
  }
  lst[[1]]
}
```
I have been using this function for several months, and have found it to be faster and use less memory than do.call(rbind, ...) [the disclaimer is that I've pretty much only used it on xts objects]

The more rows that each data.frame has, and the more elements that the list has, the more beneficial this function will be.

If you have a list of 100,000 numeric vectors, do.call(rbind, ...) will be better. If you have list of length one billion, this will be better.
```
> df <- lapply(1:10000, function(x) data.frame(x = sample(21, 21)))
> library(rbenchmark)
> benchmark(a=do.call(rbind, df), b=do.call.rbind(df))
test replications elapsed relative user.self sys.self user.child sys.child
1    a          100 327.728 1.755965   248.620   79.099          0         0
2    b          100 186.637 1.000000   181.874    4.751          0         0
```
The relative speed up will be exponentially better as you increase the length of the list.
0 讨论(0)

查看其它4个回答
发布评论:

提交评论
- 加载中...