quick/elegant way to construct mean/variance summary table

后端未结

关注

 8  2009

甜味超标 2020-12-13 19:45

I can achieve this task, but I feel like there must be a \"best\" (slickest, most compact, clearest-code, fastest?) way of doing it and have not figured it out so far ...

8条回答

轮回少年 (楼主)

2020-12-13 20:30

I'm slightly addicted to speed comparisons even though they're largely irrelevant for me in this situation ...

joran_ddply <- function(d) ddply(d,.(f1,f2,f3),
                                 summarise,y.mean = mean(y),y.var = var(y))
joshulrich_aggregate <- function(d) {
  aggregate(d$y, d[,c("f1","f2","f3")],
            FUN=function(x) c(mean=mean(x),var=var(x)))
}

formula_aggregate <- function(d) {
  aggregate(y~f1*f2*f3,data=d,
            FUN=function(x) c(mean=mean(x),var=var(x)))
}
library(data.table)
d2 <- data.table(d)
ramnath_datatable <- function(d) {
  d[,list(avg_y = mean(y), var_y = var(y)), 'f1, f2, f3']
}


library(Hmisc)
dwin_hmisc <- function(d) {summary(y ~ interaction(f3,f2,f1), 
                   data=d, method="response", 
                   fun=function(y) c(mean.y=mean(y) ,var.y=var(y) ))
                         }


library(rbenchmark)
benchmark(joran_ddply(d),
          joshulrich_aggregate(d),
          ramnath_datatable(d2),
          formula_aggregate(d),
          dwin_hmisc(d))

aggregate is fastest (even faster than data.table, which is a surprise to me, although things might be different with a bigger table to aggregate), even using the formula interface ...)

                     test replications elapsed relative user.self sys.self
5           dwin_hmisc(d)          100   1.235 2.125645     1.168    0.044
4    formula_aggregate(d)          100   0.703 1.209983     0.656    0.036
1          joran_ddply(d)          100   3.345 5.757315     3.152    0.144
2 joshulrich_aggregate(d)          100   0.581 1.000000     0.596    0.000
3   ramnath_datatable(d2)          100   0.750 1.290878     0.708    0.000

(Now I just need Dirk to step up and post an Rcpp solution that is 1000 times faster than anything else ...)

0 讨论(0)

查看其它8个回答