quick/elegant way to construct mean/variance summary table

后端 未结 8 2009
甜味超标
甜味超标 2020-12-13 19:45

I can achieve this task, but I feel like there must be a \"best\" (slickest, most compact, clearest-code, fastest?) way of doing it and have not figured it out so far ...

8条回答
  •  轮回少年
    2020-12-13 20:30

    I'm slightly addicted to speed comparisons even though they're largely irrelevant for me in this situation ...

    joran_ddply <- function(d) ddply(d,.(f1,f2,f3),
                                     summarise,y.mean = mean(y),y.var = var(y))
    joshulrich_aggregate <- function(d) {
      aggregate(d$y, d[,c("f1","f2","f3")],
                FUN=function(x) c(mean=mean(x),var=var(x)))
    }
    
    formula_aggregate <- function(d) {
      aggregate(y~f1*f2*f3,data=d,
                FUN=function(x) c(mean=mean(x),var=var(x)))
    }
    library(data.table)
    d2 <- data.table(d)
    ramnath_datatable <- function(d) {
      d[,list(avg_y = mean(y), var_y = var(y)), 'f1, f2, f3']
    }
    
    
    library(Hmisc)
    dwin_hmisc <- function(d) {summary(y ~ interaction(f3,f2,f1), 
                       data=d, method="response", 
                       fun=function(y) c(mean.y=mean(y) ,var.y=var(y) ))
                             }
    
    
    library(rbenchmark)
    benchmark(joran_ddply(d),
              joshulrich_aggregate(d),
              ramnath_datatable(d2),
              formula_aggregate(d),
              dwin_hmisc(d))
    

    aggregate is fastest (even faster than data.table, which is a surprise to me, although things might be different with a bigger table to aggregate), even using the formula interface ...)

                         test replications elapsed relative user.self sys.self
    5           dwin_hmisc(d)          100   1.235 2.125645     1.168    0.044
    4    formula_aggregate(d)          100   0.703 1.209983     0.656    0.036
    1          joran_ddply(d)          100   3.345 5.757315     3.152    0.144
    2 joshulrich_aggregate(d)          100   0.581 1.000000     0.596    0.000
    3   ramnath_datatable(d2)          100   0.750 1.290878     0.708    0.000
    

    (Now I just need Dirk to step up and post an Rcpp solution that is 1000 times faster than anything else ...)

提交回复
热议问题