问题
What would be the best tool/package to use to calculate proportions by subgroups? I thought I could try something like this:
data(mtcars)
library(plyr)
ddply(mtcars, .(cyl), transform, Pct = gear/length(gear))
But the output is not what I want, as I would want something with a number of rows equal to cyl
. Even if change it to summarise
i still get the same problem.
I am open to other packages, but I thought plyr
would be best as I would eventually like to build a function around this. Any ideas?
I'd appreciate any help just solving a basic problem like this.
回答1:
library(dplyr)
mtcars %>%
count(cyl, gear) %>%
mutate(prop = prop.table(n))
See ?count
, basically, count
is a wrapper for summarise
with n()
but it does the group by for you. Look at the output of just mtcars %>% count(cyl, gear)
. Then, we add an additional variable with mutate
named prop
which is the result of calling prop.table()
on the n
variable we created after as a result of count(cyl, gear)
.
You could create this as a function using the SE
versions of count()
, that is count_()
. Look at the vignette for Non-Standard Evaluation in the dplyr
package.
Here's a nice github gist addressing lots of cross-tabulation variants with dplyr
and other packages.
回答2:
To get frequency within a group:
library(dplyr)
mtcars %>% count(cyl, gear) %>% mutate(Freq = n/sum(n))
# Source: local data frame [8 x 4]
# Groups: cyl [3]
#
# cyl gear n Freq
# (dbl) (dbl) (int) (dbl)
# 1 4 3 1 0.09090909
# 2 4 4 8 0.72727273
# 3 4 5 2 0.18181818
# 4 6 3 2 0.28571429
# 5 6 4 4 0.57142857
# 6 6 5 1 0.14285714
# 7 8 3 12 0.85714286
# 8 8 5 2 0.14285714
or equivalently,
mtcars %>% group_by(cyl, gear) %>% summarise(n = n()) %>% mutate(Freq = n/sum(n))
Careful of what the grouping is at each stage, or your numbers will be off.
来源:https://stackoverflow.com/questions/37057784/summary-of-proportions-by-group