using purr with expand.grid to loop over formulas for a t.test while conditioning on another variable

问题

I'd like to make some purrr code more concise. I have a df with one dependent variable (y) and 4 independent variables (x1, x2, x3, x4). I also have one conditioning variable that takes 2 levels (z, is either zero or 1). I'd like to run 8 t-tests: y ~ x1 [z==0], y ~ x2 [z==0] ... y ~ x1 [z==1], y ~ x2 [z==1] etc. I'd like to return a single dataframe with the tidy tests stacked on top of each other.

I really want to generalize this method for more combinations of predictors, so building a formula and using expand.grid seems like the best way to go. I'd also like to use a combination of dplyr/purrr/broom to do this. The following works, but I'm wondering, is there a way to get everything into a single pipe?

library(tidyverse)
library(broom)

df <- data.frame(y = rnorm(100), x1 = sample(0:1, 100, replace = TRUE), x2 = sample(0:1, 100, replace = TRUE), x3 = sample(0:1, 100, replace = TRUE), x4 = sample(0:1, 100, replace = TRUE), z = sample(0:1, 100, replace = TRUE))

ivs <- c("x1", "x2", "x3", "x4")
med <- c(0, 1)

models <- expand.grid(ivs, med) %>% mutate(frm = paste0("y ~ ", Var1)) 

formula <- models$frm   
cond <- models$Var2

models <-  map2_df(formula, cond, ~tidy(t.test(as.formula(.x), data=df[df$z==.y,])))

I'm wondering why, for instance, doesn't the following work?

models <- expand.grid(ivs, med) %>% mutate(frm = paste0("y ~ ", Var1)) %>% map2_df(.$frm, .$Var2, ~tidy(t.test(as.formula(.x), data=df[df$z==.y,])))

回答1:

Your 1st code worked because formula & cond were considered lists by map2_df. However it wasn't the case when you put them in the pipe that created a data frame. You cannot do .x$frm or .x$Var2.

To make it work, you can use pmap_df to loop through each row of the data frame created inside the pipe and refer to the order of the columns by using ..1, ..2, ..3 and so on

library(tidyverse)
library(broom)

df <- data.frame(y = rnorm(100), x1 = sample(0:1, 100, replace = TRUE), 
                 x2 = sample(0:1, 100, replace = TRUE), 
                 x3 = sample(0:1, 100, replace = TRUE), 
                 x4 = sample(0:1, 100, replace = TRUE), 
                 z = sample(0:1, 100, replace = TRUE))

ivs <- c("x1", "x2", "x3", "x4")
med <- c(0, 1)

models <- expand.grid(ivs, med) %>% 
  mutate(frm = paste0("y ~ ", Var1)) 

formula <- models$frm   
cond <- models$Var2

models <-  map2_df(formula, cond, ~ tidy(t.test(as.formula(.x), data = df[df$z == .y, ])))

# using pmap to loop through the columns of the data frame (essentially list of columns)
models2 <- expand.grid(ivs, med) %>% 
  mutate(frm = paste0("y ~ ", Var1)) %>% 
  pmap_df(., ~ tidy(t.test(as.formula(..3), data = df[df$z == ..2, ])))
models2

#>     estimate    estimate1   estimate2  statistic    p.value parameter
#> 1  0.2039970 -0.002158780 -0.20615579  0.6372003 0.52724597  44.68250
#> 2 -0.4488714 -0.341650359  0.10722106 -1.4646944 0.15052718  41.56782
#> 3 -0.3016148 -0.246980034  0.05463477 -0.9189260 0.36427350  35.86492
#> 4  0.2601315 -0.004184604 -0.26431615  0.8668975 0.39031605  47.94586
#> 5 -0.2303647 -0.099116913  0.13124775 -0.8420942 0.40422649  44.61732
#> 6  0.5992558  0.385767243 -0.21348854  2.0517453 0.04957898  28.21589
#> 7  0.5027880  0.243581778 -0.25920622  1.9502349 0.05803462  40.84076
#> 8 -0.2735021 -0.101687239  0.17181481 -0.9498541 0.34888013  34.04935
#>       conf.low conf.high                  method alternative
#> 1 -0.440936247 0.8489303 Welch Two Sample t-test   two.sided
#> 2 -1.067524893 0.1697821 Welch Two Sample t-test   two.sided
#> 3 -0.967373762 0.3641441 Welch Two Sample t-test   two.sided
#> 4 -0.343220972 0.8634841 Welch Two Sample t-test   two.sided
#> 5 -0.781476516 0.3207472 Welch Two Sample t-test   two.sided
#> 6  0.001181137 1.1973304 Welch Two Sample t-test   two.sided
#> 7 -0.017929386 1.0235054 Welch Two Sample t-test   two.sided
#> 8 -0.858637554 0.3116335 Welch Two Sample t-test   two.sided

identical(models, models2)
#> [1] TRUE

Created on 2018-03-25 by the reprex package (v0.2.0).

回答2:

One option is to create a column with mutate and then unnest to make the map2 to work

expand.grid(ivs, med) %>% 
     mutate(frm = paste0("y ~ ", Var1),  
     models = map2(frm, Var2, 
             ~tidy(t.test(as.formula(.x), data=df[df$z==.y,])))) %>% 
     unnest

-output

#Var1 Var2    frm     estimate   estimate1    estimate2   statistic   p.value parameter   conf.low conf.high                  method alternative
#1   x1    0 y ~ x1 -0.114744071  0.04200976  0.156753835 -0.45597353 0.6507153  42.78050 -0.6223126 0.3928245 Welch Two Sample t-test   two.sided
#2   x2    0 y ~ x2  0.172546432  0.17607867  0.003532233  0.70766872 0.4834670  38.01821 -0.3210412 0.6661340 Welch Two Sample t-test   two.sided
#3   x3    0 y ~ x3 -0.030023506  0.08421478  0.114238290 -0.12359928 0.9022370  40.96801 -0.5206019 0.4605549 Welch Two Sample t-test   two.sided
#4   x4    0 y ~ x4  0.227916033  0.23737142  0.009455385  0.82946231 0.4139292  27.71931 -0.3351932 0.7910253 Welch Two Sample t-test   two.sided
#5   x1    1 y ~ x1 -0.296263674 -0.36088080 -0.064617122 -1.03186671 0.3071431  49.56892 -0.8730740 0.2805466 Welch Two Sample t-test   two.sided
#6   x2    1 y ~ x2 -0.006999223 -0.20785051 -0.200851283 -0.02445166 0.9805852  52.39709 -0.5812929 0.5672944 Welch Two Sample t-test   two.sided
#7   x3    1 y ~ x3 -0.408614666 -0.40526169  0.003352971 -1.45546811 0.1515498  52.00764 -0.9719677 0.1547384 Welch Two Sample t-test   two.sided
#8   x4    1 y ~ x4  0.142488951 -0.13990134 -0.282390287  0.48945275 0.6267376  48.28189 -0.4427566 0.7277345 Welch Two Sample t-test   two.sided

to make this more tidyverse syntax, we can replace the expand.grid with crossing

crossing(Var1 = ivs, Var2 = med) %>%
    mutate(frm = paste0("y ~ ", Var1),  
    models = map2(frm, Var2, 
         ~tidy(t.test(as.formula(.x), data=df[df$z==.y,])))) %>% 
    unnest

-output

# A tibble: 8 x 13
#  Var1   Var2 frm    estimate estimate1 estimate2 statistic p.value parameter conf.low conf.high method                  alternative
#  <chr> <dbl> <chr>     <dbl>     <dbl>     <dbl>     <dbl>   <dbl>     <dbl>    <dbl>     <dbl> <fctr>                  <fctr>     
#1 x1     0    y ~ x1 -0.115      0.0420   0.157     -0.456    0.651      42.8   -0.622     0.393 Welch Two Sample t-test two.sided  
#2 x1     1.00 y ~ x1 -0.296     -0.361   -0.0646    -1.03     0.307      49.6   -0.873     0.281 Welch Two Sample t-test two.sided  
#3 x2     0    y ~ x2  0.173      0.176    0.00353    0.708    0.483      38.0   -0.321     0.666 Welch Two Sample t-test two.sided  
#4 x2     1.00 y ~ x2 -0.00700   -0.208   -0.201     -0.0245   0.981      52.4   -0.581     0.567 Welch Two Sample t-test two.sided  
#5 x3     0    y ~ x3 -0.0300     0.0842   0.114     -0.124    0.902      41.0   -0.521     0.461 Welch Two Sample t-test two.sided  
#6 x3     1.00 y ~ x3 -0.409     -0.405    0.00335   -1.46     0.152      52.0   -0.972     0.155 Welch Two Sample t-test two.sided  
#7 x4     0    y ~ x4  0.228      0.237    0.00946    0.829    0.414      27.7   -0.335     0.791 Welch Two Sample t-test two.sided  
#8 x4     1.00 y ~ x4  0.142     -0.140   -0.282      0.489    0.627      48.3   -0.443     0.728 Welch Two Sample t-test two.sided

来源：https://stackoverflow.com/questions/49484484/using-purr-with-expand-grid-to-loop-over-formulas-for-a-t-test-while-conditionin

标签

purrr