Fast post hoc computation using R

后端 未结 1 942
盖世英雄少女心
盖世英雄少女心 2021-01-23 18:25

I have a large dataset which I would like to perform post hoc computation:

dat = as.data.frame(matrix(runif(10000*300), ncol = 10000, nrow = 300))

dat$group = r         


        
1条回答
  •  难免孤独
    2021-01-23 19:21

    Your example is too big. For illustration of the idea I use a small one.

    set.seed(0)
    dat = as.data.frame(matrix(runif(2*300), ncol = 2, nrow = 300))
    dat$group = rep(letters[1:3], 100)
    

    Why do you call aov on a fitted "lm" model? That basically refits the same model.

    Have a read on Fitting a linear model with multiple LHS first. lm is the workhorse of aov, so you can pass a multiple LHS formula to aov. The model has class c("maov", "aov", "mlm", "lm").

    response_names <- names(dat)[-ncol(dat)]
    form <- as.formula(sprintf("cbind(%s) ~ group", toString(response_names)))
    fit <- do.call("aov", list(formula = form, data = quote(dat)))
    

    Now the issue is: there is no "maov" method for TuckyHSD. So we need a hacking.

    TuckyHSD relies on the residuals of the fitted model. In c("aov", "lm") case the residuals is a vector, but in c("maov", "aov", "mlm", "lm") case it is a matrix. The following demonstrates the hacking.

    aov_hack <- fit
    aov_hack[c("coefficients", "fitted.values")] <- NULL  ## don't need them
    aov_hack[c("contrasts", "xlevels")] <- NULL  ## don't need them either
    attr(aov_hack$model, "terms") <- NULL  ## don't need it
    class(aov_hack) <- c("aov", "lm")  ## drop "maov" and "mlm"
    ## the following elements are mandatory for `TukeyHSD`
    ## names(aov_hack)
    #[1] "residuals"   "effects"     "rank"        "assign"      "qr"         
    #[6] "df.residual" "call"        "terms"       "model" 
    
    N <- length(response_names)  ## number of response variables
    result <- vector("list", N)
    for (i in 1:N) {
      ## change response variable in the formula
      aov_hack$call[[2]][[2]] <- as.name(response_names[i])
      ## change residuals
      aov_hack$residuals <- fit$residuals[, i]
      ## change effects
      aov_hack$effects <- fit$effects[, i]
      ## change "terms" object and attribute
      old_tm <- terms(fit)  ## old "terms" object in the model
      old_tm[[2]] <- as.name(response_names[i])  ## change response name in terms
      new_tm <- terms.formula(formula(old_tm))  ## new "terms" object
      aov_hack$terms <- new_tm  ## replace `aov_hack$terms`
      ## replace data in the model frame
      aov_hack$model[1] <- data.frame(fit$model[[1]][, i])
      names(aov_hack$model)[1] <- response_names[i]
      ## run `TukeyHSD` on `aov_hack`
      result[[i]] <- TukeyHSD(aov_hack)
      }
    

    result[[1]]  ## for example
    #  Tukey multiple comparisons of means
    #    95% family-wise confidence level
    #
    #Fit: aov(formula = V1 ~ group, data = dat)
    #
    #$group
    #            diff        lwr        upr     p adj
    #b-a -0.012743870 -0.1043869 0.07889915 0.9425847
    #c-a -0.022470482 -0.1141135 0.06917254 0.8322109
    #c-b -0.009726611 -0.1013696 0.08191641 0.9661356
    

    I have used a "for" loop. Replace it with a lapply if you want.

    0 讨论(0)
提交回复
热议问题