问题
I'd like to make some purrr code more concise. I have a df with one dependent variable (y) and 4 independent variables (x1, x2, x3, x4). I also have one conditioning variable that takes 2 levels (z, is either zero or 1). I'd like to run 8 t-tests: y ~ x1 [z==0], y ~ x2 [z==0] ... y ~ x1 [z==1], y ~ x2 [z==1] etc. I'd like to return a single dataframe with the tidy tests stacked on top of each other.
I really want to generalize this method for more combinations of predictors, so building a formula and using expand.grid seems like the best way to go. I'd also like to use a combination of dplyr/purrr/broom to do this. The following works, but I'm wondering, is there a way to get everything into a single pipe?
library(tidyverse)
library(broom)
df <- data.frame(y = rnorm(100), x1 = sample(0:1, 100, replace = TRUE), x2 = sample(0:1, 100, replace = TRUE), x3 = sample(0:1, 100, replace = TRUE), x4 = sample(0:1, 100, replace = TRUE), z = sample(0:1, 100, replace = TRUE))
ivs <- c("x1", "x2", "x3", "x4")
med <- c(0, 1)
models <- expand.grid(ivs, med) %>% mutate(frm = paste0("y ~ ", Var1))
formula <- models$frm
cond <- models$Var2
models <- map2_df(formula, cond, ~tidy(t.test(as.formula(.x), data=df[df$z==.y,])))
I'm wondering why, for instance, doesn't the following work?
models <- expand.grid(ivs, med) %>% mutate(frm = paste0("y ~ ", Var1)) %>% map2_df(.$frm, .$Var2, ~tidy(t.test(as.formula(.x), data=df[df$z==.y,])))
回答1:
Your 1st code worked because formula
& cond
were considered lists by map2_df
. However it wasn't the case when you put them in the pipe
that created a data frame. You cannot do .x$frm
or .x$Var2
.
To make it work, you can use pmap_df
to loop through each row of the data frame created inside the pipe
and refer to the order of the columns by using ..1, ..2, ..3
and so on
library(tidyverse)
library(broom)
df <- data.frame(y = rnorm(100), x1 = sample(0:1, 100, replace = TRUE),
x2 = sample(0:1, 100, replace = TRUE),
x3 = sample(0:1, 100, replace = TRUE),
x4 = sample(0:1, 100, replace = TRUE),
z = sample(0:1, 100, replace = TRUE))
ivs <- c("x1", "x2", "x3", "x4")
med <- c(0, 1)
models <- expand.grid(ivs, med) %>%
mutate(frm = paste0("y ~ ", Var1))
formula <- models$frm
cond <- models$Var2
models <- map2_df(formula, cond, ~ tidy(t.test(as.formula(.x), data = df[df$z == .y, ])))
# using pmap to loop through the columns of the data frame (essentially list of columns)
models2 <- expand.grid(ivs, med) %>%
mutate(frm = paste0("y ~ ", Var1)) %>%
pmap_df(., ~ tidy(t.test(as.formula(..3), data = df[df$z == ..2, ])))
models2
#> estimate estimate1 estimate2 statistic p.value parameter
#> 1 0.2039970 -0.002158780 -0.20615579 0.6372003 0.52724597 44.68250
#> 2 -0.4488714 -0.341650359 0.10722106 -1.4646944 0.15052718 41.56782
#> 3 -0.3016148 -0.246980034 0.05463477 -0.9189260 0.36427350 35.86492
#> 4 0.2601315 -0.004184604 -0.26431615 0.8668975 0.39031605 47.94586
#> 5 -0.2303647 -0.099116913 0.13124775 -0.8420942 0.40422649 44.61732
#> 6 0.5992558 0.385767243 -0.21348854 2.0517453 0.04957898 28.21589
#> 7 0.5027880 0.243581778 -0.25920622 1.9502349 0.05803462 40.84076
#> 8 -0.2735021 -0.101687239 0.17181481 -0.9498541 0.34888013 34.04935
#> conf.low conf.high method alternative
#> 1 -0.440936247 0.8489303 Welch Two Sample t-test two.sided
#> 2 -1.067524893 0.1697821 Welch Two Sample t-test two.sided
#> 3 -0.967373762 0.3641441 Welch Two Sample t-test two.sided
#> 4 -0.343220972 0.8634841 Welch Two Sample t-test two.sided
#> 5 -0.781476516 0.3207472 Welch Two Sample t-test two.sided
#> 6 0.001181137 1.1973304 Welch Two Sample t-test two.sided
#> 7 -0.017929386 1.0235054 Welch Two Sample t-test two.sided
#> 8 -0.858637554 0.3116335 Welch Two Sample t-test two.sided
identical(models, models2)
#> [1] TRUE
Created on 2018-03-25 by the reprex package (v0.2.0).
回答2:
One option is to create a column with mutate
and then unnest
to make the map2
to work
expand.grid(ivs, med) %>%
mutate(frm = paste0("y ~ ", Var1),
models = map2(frm, Var2,
~tidy(t.test(as.formula(.x), data=df[df$z==.y,])))) %>%
unnest
-output
#Var1 Var2 frm estimate estimate1 estimate2 statistic p.value parameter conf.low conf.high method alternative
#1 x1 0 y ~ x1 -0.114744071 0.04200976 0.156753835 -0.45597353 0.6507153 42.78050 -0.6223126 0.3928245 Welch Two Sample t-test two.sided
#2 x2 0 y ~ x2 0.172546432 0.17607867 0.003532233 0.70766872 0.4834670 38.01821 -0.3210412 0.6661340 Welch Two Sample t-test two.sided
#3 x3 0 y ~ x3 -0.030023506 0.08421478 0.114238290 -0.12359928 0.9022370 40.96801 -0.5206019 0.4605549 Welch Two Sample t-test two.sided
#4 x4 0 y ~ x4 0.227916033 0.23737142 0.009455385 0.82946231 0.4139292 27.71931 -0.3351932 0.7910253 Welch Two Sample t-test two.sided
#5 x1 1 y ~ x1 -0.296263674 -0.36088080 -0.064617122 -1.03186671 0.3071431 49.56892 -0.8730740 0.2805466 Welch Two Sample t-test two.sided
#6 x2 1 y ~ x2 -0.006999223 -0.20785051 -0.200851283 -0.02445166 0.9805852 52.39709 -0.5812929 0.5672944 Welch Two Sample t-test two.sided
#7 x3 1 y ~ x3 -0.408614666 -0.40526169 0.003352971 -1.45546811 0.1515498 52.00764 -0.9719677 0.1547384 Welch Two Sample t-test two.sided
#8 x4 1 y ~ x4 0.142488951 -0.13990134 -0.282390287 0.48945275 0.6267376 48.28189 -0.4427566 0.7277345 Welch Two Sample t-test two.sided
to make this more tidyverse
syntax, we can replace the expand.grid
with crossing
crossing(Var1 = ivs, Var2 = med) %>%
mutate(frm = paste0("y ~ ", Var1),
models = map2(frm, Var2,
~tidy(t.test(as.formula(.x), data=df[df$z==.y,])))) %>%
unnest
-output
# A tibble: 8 x 13
# Var1 Var2 frm estimate estimate1 estimate2 statistic p.value parameter conf.low conf.high method alternative
# <chr> <dbl> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <fctr> <fctr>
#1 x1 0 y ~ x1 -0.115 0.0420 0.157 -0.456 0.651 42.8 -0.622 0.393 Welch Two Sample t-test two.sided
#2 x1 1.00 y ~ x1 -0.296 -0.361 -0.0646 -1.03 0.307 49.6 -0.873 0.281 Welch Two Sample t-test two.sided
#3 x2 0 y ~ x2 0.173 0.176 0.00353 0.708 0.483 38.0 -0.321 0.666 Welch Two Sample t-test two.sided
#4 x2 1.00 y ~ x2 -0.00700 -0.208 -0.201 -0.0245 0.981 52.4 -0.581 0.567 Welch Two Sample t-test two.sided
#5 x3 0 y ~ x3 -0.0300 0.0842 0.114 -0.124 0.902 41.0 -0.521 0.461 Welch Two Sample t-test two.sided
#6 x3 1.00 y ~ x3 -0.409 -0.405 0.00335 -1.46 0.152 52.0 -0.972 0.155 Welch Two Sample t-test two.sided
#7 x4 0 y ~ x4 0.228 0.237 0.00946 0.829 0.414 27.7 -0.335 0.791 Welch Two Sample t-test two.sided
#8 x4 1.00 y ~ x4 0.142 -0.140 -0.282 0.489 0.627 48.3 -0.443 0.728 Welch Two Sample t-test two.sided
来源:https://stackoverflow.com/questions/49484484/using-purr-with-expand-grid-to-loop-over-formulas-for-a-t-test-while-conditionin