问题
I'd like to get chi-square output matrices (e.g., standardized residuals, expected values) by group using elements of the tidyverse. Using the mtcars data set, here's where I've started:
mtcars %>%
dplyr::select(vs, am) %>%
table() %>%
chisq.test(.)
Which produces the chi-square test statistic. In order to get standardized residuals, for example, my only successful code is this:
mtcars %>%
dplyr::select(vs, am) %>%
table() %>%
chisq.test(.) -> chi.out
chi.out$stdres
vs am Freq
1 0 0 0.9523038
2 1 0 -0.9523038
3 0 1 -0.9523038
4 1 1 0.9523038
Ideally, I'd like to get the observed values and the standardized residuals into a dataframe format. Something like this:
cbind(as.data.frame(chi.out$observed),as.data.frame(chi.out$stdres))
vs am Freq vs am Freq
1 0 0 12 0 0 0.9523038
2 1 0 7 1 0 -0.9523038
3 0 1 6 0 1 -0.9523038
4 1 1 7 1 1 0.9523038
Finally, I'd like to do this by group, for example over the cyl column in the mtcars data set. Seems dplyr and some some version of purrr's map with map_dfr or map_dfc would do the trick but I can't quite pull it together. Thanks in advance.
回答1:
So this is my proposal for a solution.
library(dplyr)
library(reshape2)
mtcars %>%
select(vs, am, cyl) %>%
table() %>%
apply(3, chisq.test) %>%
lapply(`[`, c(6,9)) %>%
melt() %>%
spread(key = L2, value = value) %>%
rename(cyl = L1) %>%
select(cyl, vs, am, observed, stdres) %>%
arrange(cyl)
cyl vs am observed stdres
1 4 0 0 0 -0.6422616
2 4 0 1 1 0.6422616
3 4 1 0 3 0.6422616
4 4 1 1 7 -0.6422616
5 6 0 0 0 -2.6457513
6 6 0 1 3 2.6457513
7 6 1 0 4 2.6457513
8 6 1 1 0 -2.6457513
9 8 0 0 12 NaN
10 8 0 1 2 NaN
11 8 1 0 0 NaN
12 8 1 1 0 NaN
This does a chi-square test for each group of cyl
. The grouping is done implicitly in the select()
statement. In the end you get the observed values and standardized residuals for every combination of cyl
, vs
, am
. Should be applicable to any dataframe.
Hope this is what you were looking for.
来源:https://stackoverflow.com/questions/55330043/use-r-dplyr-purrr-to-get-chi-square-output-matrices-by-group