I can achieve this task, but I feel like there must be a \"best\" (slickest, most compact, clearest-code, fastest?) way of doing it and have not figured it out so far ...
I'm slightly addicted to speed comparisons even though they're largely irrelevant for me in this situation ...
joran_ddply <- function(d) ddply(d,.(f1,f2,f3),
summarise,y.mean = mean(y),y.var = var(y))
joshulrich_aggregate <- function(d) {
aggregate(d$y, d[,c("f1","f2","f3")],
FUN=function(x) c(mean=mean(x),var=var(x)))
}
formula_aggregate <- function(d) {
aggregate(y~f1*f2*f3,data=d,
FUN=function(x) c(mean=mean(x),var=var(x)))
}
library(data.table)
d2 <- data.table(d)
ramnath_datatable <- function(d) {
d[,list(avg_y = mean(y), var_y = var(y)), 'f1, f2, f3']
}
library(Hmisc)
dwin_hmisc <- function(d) {summary(y ~ interaction(f3,f2,f1),
data=d, method="response",
fun=function(y) c(mean.y=mean(y) ,var.y=var(y) ))
}
library(rbenchmark)
benchmark(joran_ddply(d),
joshulrich_aggregate(d),
ramnath_datatable(d2),
formula_aggregate(d),
dwin_hmisc(d))
aggregate
is fastest (even faster than data.table
, which is a surprise to me, although things might be different with a bigger table to aggregate), even using the formula interface ...)
test replications elapsed relative user.self sys.self
5 dwin_hmisc(d) 100 1.235 2.125645 1.168 0.044
4 formula_aggregate(d) 100 0.703 1.209983 0.656 0.036
1 joran_ddply(d) 100 3.345 5.757315 3.152 0.144
2 joshulrich_aggregate(d) 100 0.581 1.000000 0.596 0.000
3 ramnath_datatable(d2) 100 0.750 1.290878 0.708 0.000
(Now I just need Dirk to step up and post an Rcpp
solution that is 1000 times faster than anything else ...)