R: speeding up “group by” operations

后端未结

关注

 5  747

挽巷 2020-11-28 19:30

I have a simulation that has a huge aggregate and combine step right in the middle. I prototyped this process using plyr\'s ddply() function which works great for a huge per

5条回答

死守一世寂寞 (楼主)

2020-11-28 20:05
Further 2x speedup and more concise code:
```
library(data.table)
dtb <- data.table(myDF, key="year,state,group1,group2")
system.time( 
  res <- dtb[, weighted.mean(myFact, weights), by=list(year, state, group1, group2)] 
)
#   user  system elapsed 
#  0.950   0.050   1.007 
```
My first post, so please be nice ;)

From data.table v1.9.2, setDT function is exported that'll convert data.frame to data.table by reference (in keeping with data.table parlance - all set* functions modify the object by reference). This means, no unnecessary copying, and is therefore fast. You can time it, but it'll be negligent.
```
require(data.table)
system.time({
  setDT(myDF)
  res <- myDF[, weighted.mean(myFact, weights), 
             by=list(year, state, group1, group2)] 
})
#   user  system elapsed 
#  0.970   0.024   1.015 
```
This is as opposed to 1.264 seconds with OP's solution above, where data.table(.) is used to create dtb.
0 讨论(0)

查看其它5个回答
发布评论:

提交评论
- 加载中...