Use R dplyr/purrr To Get Chi-square Output Matrices By Group

最后都变了- 提交于 2019-12-11 15:05:06

问题


I'd like to get chi-square output matrices (e.g., standardized residuals, expected values) by group using elements of the tidyverse. Using the mtcars data set, here's where I've started:

mtcars %>% 
  dplyr::select(vs, am) %>%
  table() %>%
  chisq.test(.) 

Which produces the chi-square test statistic. In order to get standardized residuals, for example, my only successful code is this:

mtcars %>% 
  dplyr::select(vs, am) %>%
  table() %>%
  chisq.test(.) -> chi.out

chi.out$stdres

     vs am       Freq
1  0  0  0.9523038
2  1  0 -0.9523038
3  0  1 -0.9523038
4  1  1  0.9523038

Ideally, I'd like to get the observed values and the standardized residuals into a dataframe format. Something like this:

cbind(as.data.frame(chi.out$observed),as.data.frame(chi.out$stdres))

  vs am Freq vs am       Freq
1  0  0   12  0  0  0.9523038
2  1  0    7  1  0 -0.9523038
3  0  1    6  0  1 -0.9523038
4  1  1    7  1  1  0.9523038

Finally, I'd like to do this by group, for example over the cyl column in the mtcars data set. Seems dplyr and some some version of purrr's map with map_dfr or map_dfc would do the trick but I can't quite pull it together. Thanks in advance.


回答1:


So this is my proposal for a solution.

library(dplyr)
library(reshape2)

mtcars %>% 
  select(vs, am, cyl) %>%
  table() %>%
  apply(3, chisq.test) %>%
  lapply(`[`, c(6,9)) %>%
  melt() %>%
  spread(key = L2, value = value) %>%
  rename(cyl = L1) %>%
  select(cyl, vs, am, observed, stdres) %>%
  arrange(cyl)


   cyl vs am observed     stdres
1    4  0  0        0 -0.6422616
2    4  0  1        1  0.6422616
3    4  1  0        3  0.6422616
4    4  1  1        7 -0.6422616
5    6  0  0        0 -2.6457513
6    6  0  1        3  2.6457513
7    6  1  0        4  2.6457513
8    6  1  1        0 -2.6457513
9    8  0  0       12        NaN
10   8  0  1        2        NaN
11   8  1  0        0        NaN
12   8  1  1        0        NaN

This does a chi-square test for each group of cyl. The grouping is done implicitly in the select() statement. In the end you get the observed values and standardized residuals for every combination of cyl, vs, am. Should be applicable to any dataframe.

Hope this is what you were looking for.



来源:https://stackoverflow.com/questions/55330043/use-r-dplyr-purrr-to-get-chi-square-output-matrices-by-group

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!