Can rbind be parallelized in R?

后端 未结 6 778
无人共我
无人共我 2020-12-01 03:14

As I am sitting here waiting for some R scripts to run...I was wondering... is there any way to parallelize rbind in R?

I sitting waiting for this call to complete

6条回答
  •  甜味超标
    2020-12-01 03:42

    Here's a solution, it naturally extends to rbind.fill, merge, and other dataframe list functions:

    But like with all my answers/questions verify :)

    require(snowfall)
    require(rbenchmark)
    
    rbinder <- function(..., cores=NULL){
      if(is.null(cores)){
        do.call("rbind", ...)
      }else{
        sequ <- as.integer(seq(1, length(...), length.out=cores+1))
        listOLists <- paste(paste("list", seq(cores), sep=""), " = ...[",  c(1, sequ[2:cores]+1), ":", sequ[2:(cores+1)], "]", sep="", collapse=", ") 
        dfs <- eval(parse(text=paste("list(", listOLists, ")")))
        suppressMessages(sfInit(parallel=TRUE, cores))
        dfs <- sfLapply(dfs, function(x) do.call("rbind", x))
        suppressMessages(sfStop())
        do.call("rbind", dfs)   
      }
    }
    
    pieces <- lapply(seq(1000), function(.) data.frame(matrix(runif(1000), ncol=1000)))
    
    benchmark(do.call("rbind", pieces), rbinder(pieces), rbinder(pieces, cores=4), replications = 10)
    
    #test replications elapsed relative user.self sys.self user.child sys.child
    #With intel i5 3570k    
    #1     do.call("rbind", pieces)           10  116.70    6.505    115.79     0.10         NA        NA
    #3 rbinder(pieces, cores = 4)           10   17.94    1.000      1.67     2.12         NA        NA
    #2              rbinder(pieces)           10  116.03    6.468    115.50     0.05         NA        NA
    

提交回复
热议问题