ddply summarise proportional count

邮差的信 提交于 2019-12-04 08:15:47

You cannot do it in one ddply call because what gets passed to each summarize call is a subset of your data for a specific combination of your group variables. At this lowest level, you do not have access to that intermediate level sum(n). Instead, do it in two steps:

kano_final <- ddply(kano_final, .(X5employf), transform,
                    sum.n = length(X5employf))

ddply(kano_final, .(X5employf, X5employff), summarise, 
      n = length(X5employff), prop = n / sum.n[1] * 100)

Edit: using a single ddply call and using table as you hinted towards:

ddply(kano_final, .(X5employf), summarise,
      n          = Filter(function(x) x > 0, table(X5employff, useNA = "ifany")),
      prop       = 100* prop.table(n),
      X5employff = names(n))

I'd add here an example with dplyr which makes it quite easily in one step, with a short-code and easy-to-read syntax.

d is your data.frame

library(dplyr)
d%.%
  dplyr:::group_by(X5employf, X5employff) %.%
  dplyr:::summarise(n = length(X5employff)) %.%
  dplyr:::mutate(ngr = sum(n)) %.% 
  dplyr:::mutate(prop = n/ngr*100)

will result in

Source: local data frame [15 x 5]
Groups: X5employf

   X5employf X5employff  n ngr      prop
1   increase          1 26  44 59.090909
2   increase          2  1  44  2.272727
3   increase          3 15  44 34.090909
4   increase    1 and 8  1  44  2.272727
5   increase         NA  1  44  2.272727
6   decrease          4  1  10 10.000000
7   decrease          5  5  10 50.000000
8   decrease          6  2  10 20.000000
9   decrease          7  1  10 10.000000
10  decrease          8  1  10 10.000000
11      same          4  4  19 21.052632
12      same          5  6  19 31.578947
13      same          6  5  19 26.315789
14      same    6 and 7  3  19 15.789474
15      same          7  1  19  5.263158

What you apparently want to do is to find out the proportions of X5employff for every value of X5employf. However, you don't tell ddply that X5employf and X5employff are different; to ddply, these two variables are just two variables to split up the data. Also, since there is one observation per line, i.e. count = 1 for every line of the data, the length of each (X5employf, X5employff) combination equals the sum of each (X5employf, X5employff) combination.

The simplest "plyr way" to solve your problem that I can think of is the following:

result <- ddply(kano_final, .(X5employf, X5employff), summarise, n=length(X5employff), drop=FALSE)
n <- result$n
n2 <- ddply(kano_final, .(X5employf), summarise, n=length(X5employff))$n
result <- data.frame(result, prop=n/rep(n2, each=13)*100)

You can also use good old xtabs:

a <- xtabs(~X5employf + X5employff, kano_final)
b <- xtabs(~X5employf, kano_final)
a/matrix(b, nrow=3, ncol=ncol(a))
标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!