问题
I have an issue understanding how to use the dplyr bootstrap function properly.
What I want is to generate a bootstrap distribution from two randomly assigned groups and compute the difference in means, like this for example :
library(dplyr)
library(broom)
data(mtcars)
mtcars %>%
mutate(treat = sample(c(0, 1), 32, replace = T)) %>%
group_by(treat) %>%
summarise(m = mean(disp)) %>%
summarise(m = m[treat == 1] - m[treat == 0])
The issue is that I need to repeat this operation 100, 1000, or more times.
Using replicate, I can do
frep = function(mtcars) mtcars %>%
mutate(treat = sample(c(0, 1), 32, replace = T)) %>%
group_by(treat) %>%
summarise(m = mean(disp)) %>%
summarise(m = m[treat == 1] - m[treat == 0])
replicate(1000, frep(mtcars = mtcars), simplify = T) %>% unlist()
and get the distribution
I don't really get how to use bootstraphere. How should I start ?
mtcars %>%
bootstrap(10) %>%
mutate(treat = sample(c(0, 1), 32, replace = T))
mtcars %>%
bootstrap(10) %>%
do(tidy(treat = sample(c(0, 1), 32, replace = T)))
It's not really working. Where should I put the bootstrap pip ?
Thanks.
回答1:
In the do step, we wrap with data.frame and create the 'treat' column, then we can group by 'replicate' and 'treat' to get the summarised output column
mtcars %>%
bootstrap(10) %>%
do(data.frame(., treat = sample(c(0,1), 32, replace=TRUE))) %>%
group_by(replicate, treat) %>%
summarise(m = mean(disp)) %>%
summarise(m = m[treat == 1] - m[treat == 0])
#or as 1 occurs second and 0 second, we can also use
#summarise(m = last(m) - first(m))
来源:https://stackoverflow.com/questions/39548923/r-dplyr-bootstrap-issue