Summary of proportions by group

隐身守侯 提交于 2019-11-29 16:40:31
library(dplyr)

mtcars %>%
  count(cyl, gear) %>%
  mutate(prop = prop.table(n))

See ?count, basically, count is a wrapper for summarise with n() but it does the group by for you. Look at the output of just mtcars %>% count(cyl, gear). Then, we add an additional variable with mutate named prop which is the result of calling prop.table() on the n variable we created after as a result of count(cyl, gear).

You could create this as a function using the SE versions of count(), that is count_(). Look at the vignette for Non-Standard Evaluation in the dplyr package.

Here's a nice github gist addressing lots of cross-tabulation variants with dplyr and other packages.

To get frequency within a group:

library(dplyr)
mtcars %>% count(cyl, gear) %>% mutate(Freq = n/sum(n))
# Source: local data frame [8 x 4]
# Groups: cyl [3]
# 
#     cyl  gear     n       Freq
#   (dbl) (dbl) (int)      (dbl)
# 1     4     3     1 0.09090909
# 2     4     4     8 0.72727273
# 3     4     5     2 0.18181818
# 4     6     3     2 0.28571429
# 5     6     4     4 0.57142857
# 6     6     5     1 0.14285714
# 7     8     3    12 0.85714286
# 8     8     5     2 0.14285714

or equivalently,

mtcars %>% group_by(cyl, gear) %>% summarise(n = n()) %>% mutate(Freq = n/sum(n))

Careful of what the grouping is at each stage, or your numbers will be off.

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!