Trouble converting long list of data.frames (~1 million) to single data.frame using do.call and ldply

前端 未结 4 1559
忘了有多久
忘了有多久 2020-12-02 19:28

I know there are many questions here in SO about ways to convert a list of data.frames to a single data.frame using do.call or ldply, but this questions is about understandi

4条回答
  •  醉话见心
    2020-12-02 19:43

    You have a list of data.frames that each have a single row. If it is possible to convert each of those to a vector, I think that would speed things up a lot.

    However, assuming that they need to be data.frames, I'll create a function with code borrowed from Dominik's answer at Can rbind be parallelized in R?

    do.call.rbind <- function (lst) {
      while (length(lst) > 1) {
        idxlst <- seq(from = 1, to = length(lst), by = 2)
        lst <- lapply(idxlst, function(i) {
          if (i == length(lst)) {
            return(lst[[i]])
          }
          return(rbind(lst[[i]], lst[[i + 1]]))
        })
      }
      lst[[1]]
    }
    

    I have been using this function for several months, and have found it to be faster and use less memory than do.call(rbind, ...) [the disclaimer is that I've pretty much only used it on xts objects]

    The more rows that each data.frame has, and the more elements that the list has, the more beneficial this function will be.

    If you have a list of 100,000 numeric vectors, do.call(rbind, ...) will be better. If you have list of length one billion, this will be better.

    > df <- lapply(1:10000, function(x) data.frame(x = sample(21, 21)))
    > library(rbenchmark)
    > benchmark(a=do.call(rbind, df), b=do.call.rbind(df))
    test replications elapsed relative user.self sys.self user.child sys.child
    1    a          100 327.728 1.755965   248.620   79.099          0         0
    2    b          100 186.637 1.000000   181.874    4.751          0         0
    

    The relative speed up will be exponentially better as you increase the length of the list.

提交回复
热议问题