R: speeding up “group by” operations

后端 未结 5 747
挽巷
挽巷 2020-11-28 19:30

I have a simulation that has a huge aggregate and combine step right in the middle. I prototyped this process using plyr\'s ddply() function which works great for a huge per

5条回答
  •  死守一世寂寞
    2020-11-28 20:05

    Further 2x speedup and more concise code:

    library(data.table)
    dtb <- data.table(myDF, key="year,state,group1,group2")
    system.time( 
      res <- dtb[, weighted.mean(myFact, weights), by=list(year, state, group1, group2)] 
    )
    #   user  system elapsed 
    #  0.950   0.050   1.007 
    

    My first post, so please be nice ;)


    From data.table v1.9.2, setDT function is exported that'll convert data.frame to data.table by reference (in keeping with data.table parlance - all set* functions modify the object by reference). This means, no unnecessary copying, and is therefore fast. You can time it, but it'll be negligent.

    require(data.table)
    system.time({
      setDT(myDF)
      res <- myDF[, weighted.mean(myFact, weights), 
                 by=list(year, state, group1, group2)] 
    })
    #   user  system elapsed 
    #  0.970   0.024   1.015 
    

    This is as opposed to 1.264 seconds with OP's solution above, where data.table(.) is used to create dtb.

提交回复
热议问题