I have a simulation that has a huge aggregate and combine step right in the middle. I prototyped this process using plyr\'s ddply() function which works great for a huge per
Further 2x speedup and more concise code:
library(data.table)
dtb <- data.table(myDF, key="year,state,group1,group2")
system.time(
res <- dtb[, weighted.mean(myFact, weights), by=list(year, state, group1, group2)]
)
# user system elapsed
# 0.950 0.050 1.007
My first post, so please be nice ;)
From data.table v1.9.2, setDT function is exported that'll convert data.frame to data.table by reference (in keeping with data.table parlance - all set* functions modify the object by reference). This means, no unnecessary copying, and is therefore fast. You can time it, but it'll be negligent.
require(data.table)
system.time({
setDT(myDF)
res <- myDF[, weighted.mean(myFact, weights),
by=list(year, state, group1, group2)]
})
# user system elapsed
# 0.970 0.024 1.015
This is as opposed to 1.264 seconds with OP's solution above, where data.table(.) is used to create dtb.