How to conduct an ANOVA of several variables taken on individuals separated by multiple grouping variables?

ぐ巨炮叔叔 提交于 2021-01-29 08:40:27


I have a data frame similar to the one created by the code below. In this example, measurements of 5 variables are taken on are 30 individuals represented by ID. The individuals can be separated by any of three grouping variables: GroupVar1,GroupVar2,GroupVar3. For each of the grouping variables, I need to conduct an ANOVA for each of the 5 variables, and return the results of each (possibly onto a pdf or separate document?). How can I write a function, or use iteration, to handle this problem and minimize repetition in my code? What is the best way to extract and visualize the results if you have a large dataset (my real data set has several hundred individuals, and the grouping variables range in size from 6 to 30 groups)?

GroupVar1 <- rep(c("FL", "GA", "SC", "NC", "VA", "GA"), each = 5)
GroupVar2 <- rep(c("alpha", "beta", "gamma"), each = 10)
GroupVar3 <- rep(c("Bravo", "Charlie", "Delta", "Echo"), times = c(7,8,10,5))
ID <- c("A", "B", "C", "D", "E", "F", "G", "H", "I", "J", "K", "L", "M", "N", "O", "P", "Q", "R", "S", "T", "U", "V", "W", "X", "Y","Z", "a","b","c","d")
Var1 <- rnorm(30)
Var2 <- rnorm(30)
Var3 <- rnorm(30)
Var4 <- rnorm(30)
Var5 <- rnorm(30)
data <- tibble(GroupVar1,GroupVar2,GroupVar3,ID,Var1,Var2,Var3,Var4,Var5)

> dput(data[1:10,])
structure(list(Location = structure(c(21L, 21L, 21L, 21L, 21L, 
21L, 21L, 21L, 21L, 21L), .Label = c("ALTE", "ASTR", "BREA", 
"CAMN", "CFU", "COEN", "JENT", "NAT", "NEAU", "NOCO", "OOGG", 
"OPMM", "PING", "PITC", "POMO", "REAN", "ROND", "RTD", "SANT", 
"SMIT", "SUN", "TEAR", "WINC"), class = "factor"), PR = structure(c(16L, 
16L, 16L, 16L, 16L, 16L, 16L, 16L, 16L, 16L), .Label = c("ALTE", 
"ASTR", "CF", "CHOW", "JENT", "NAT", "NEAU", "NSE", "OOGG", "PALM", 
"POMO", "REAN", "ROND", "RTD", "SS", "SUN", "WINC"), class = "factor"), 
    Est = structure(c(2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L), .Label = c("AS", 
    "CB", "CF", "CS", "OS", "PS", "SS", "WB"), class = "factor"), 
    State = structure(c(3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L
    ), .Label = c("FL", "GA", "MD", "NC", "SC", "VA"), class = "factor"), 
    Year = c(2017L, 2017L, 2017L, 2017L, 2017L, 2017L, 2017L, 
    2017L, 2017L, 2017L), ID = c(90L, 92L, 93L, 95L, 96L, 98L, 
    99L, 100L, 103L, 109L), Sex = structure(c(1L, 2L, 2L, 2L, 
    1L, 1L, 2L, 1L, 1L, 2L), .Label = c("F", "M"), class = "factor"), 
    DOB = c(-0.674706816, 2.10472846, 0.279952847, -0.26959379, 
    -1.243977657, 0.188828771, 0.026530709, 0.483363306, -0.63599302, 
    -0.979506001), Mg = c(-1.409815618, 1.180920604, 0.765102543, 
    1.828057339, -0.689841498, -0.604272366, 0.194867939, -1.015964127, 
    -0.520136693, 0.769042585), Mn7 = c(1.387385913, 0.320582444, 
    -0.490356598, -0.020540649, -0.594210249, -1.119170306, -0.225065868, 
    -1.892064456, -2.434101506, -0.816518662), Cu7 = c(-0.176599651, 
    0.100529267, 1.4967142, 0.094840221, 1.791653259, -0.191723817, 
    -1.526868086, -0.308696916, -2.046613977, -2.228513411), 
    Zn7 = c(-0.338454617, -0.235800727, -0.785876374, 0.114698826, 
    0.202960987, 0.432013987, 0.164099621, 0.609232311, 0.169329098, 
    -0.284402654), Sr7 = c(-0.010929071, -1.616835312, -0.208856, 
    -0.362538736, 1.662066318, -0.893155185, 0.699406559, -0.333176495, 
    -2.026364633, -1.324456127), Ba7 = c(-1.041126455, 0.551165907, 
    0.126849272, -1.069762666, -0.922501551, -1.36095076, 1.57800858, 
    -0.842518997, -1.017894235, 0.265895019)), row.names = c(NA, 
10L), class = "data.frame")


Edited answer based on updated question with dput data:

Assuming the columns representing the grouping variables are 1:5 and 7, and assuming the dependent numeric variables are in columns 8:14, this can be done using a double loop, with no other dependencies:

tests <- list()
Groups <- c(1:5, 7)
Variables <- 8:14
for(i in Groups)
  Group <- as.factor(data[[i]])
  for(j in Variables)
    test_name <- paste0(names(data)[j], "_by_", names(data[i]))
    Response <- data[[j]]
    tests[[test_name]] <- anova(lm(Response ~ Group))

Now you can do what you like with all these tests using lapply, such as

lapply(tests, print)

I agree with @DaveGruenewald about multiple hypothesis testing though - in fact, this example gave a nice demonstration of why Buonferroni or Sidak's corrections are needed, since there were (as expected) a few "significant" p values among the random data simply due to the number of tests involved.


Without knowing too much about the underlying data, my hunch is this may be improper use of an ANOVA. I would advise that you post to Cross Validated to confirm you aren't breaking any assumptions here.

Regardless, here is the code I would use to tackle the problem presented:

# We will use dplyr, tidyr, purrr, stats, and broom to accomplish this
# I am using tidyr v1.0.0.  For older versions you will need to modify code for pivot_longer

results <- data %>% 
  # First pivot the data longer so each dependent variable is on its own row
    cols = Var1:Var5,
    names_to = "name",
    values_to = "value"
  ) %>% 
  # Second, pivot longer again, so each row is now its unique grouping var
    cols = GroupVar1:GroupVar3, 
    names_to = "group_name",
    values_to = "group_value"
  ) %>% 
  # group by both group name and dependent variable
  group_by(name, group_name) %>% 
  # nest the data, so each dataset is unique for each dependent and independent variable
  nest() %>% 
    # run an anova on each nested data frame
    anova = map(data, ~aov(data = .x, value ~ group_value)), # may need to change aov() call here
    # use broom to tidy the output
    tidied_results = map(anova, broom::tidy)

# To easily access the ANOVA results, you can do something like the following:

results %>% 
  # select columns of interest
  select(name, group_name, tidied_results) %>% 
  # unnest to access summary information of ANOVA
  unnest(cols = c(tidied_results))

I think you'll also want to use some sort of multiple-comparison correction, such as Bonferroni Correction. Again, Cross Validated can lead you in the right direction here.

