R calculating grouped frequency table with percentage [duplicate]

天大地大妈咪最大 提交于 2021-02-11 12:29:29

问题


Giving the following data.frame, I would like to calculate the occurance of each variable of VAR and the percentage of these occurence by the grouping variable GROUP:

GROUP<-c("G1","G2","G1","G2","G3","G3","G1")
VAR<-c("A","B","B","A","B","B","A")
d<-data.frame(GROUP,VAR)

With table(), I get a nice frequency table, counting the occurences of all combinations of the two variables:

d<-as.data.frame(table(d))
  GROUP VAR Freq
1    G1   A    2
2    G2   A    1
3    G3   A    0
4    G1   B    1
5    G2   B    1
6    G3   B    2

Now I would like to calculate the percentage of each variable for VAR by GROUP. So far I'm splitting the data.frame by GROUP and calculate the percentage seperately for G1, G2 and G3 and merging the afterwards.

d.G1<-d[d$GROUP=="G1",]
d.G1$per<-d.G1$Freq/sum(d.G1$Freq)
d.G1
  GROUP VAR Freq       per
1    G1   A    2 0.6666667
4    G1   B    1 0.3333333

...

d.merge<-rbind(d.G1,d.G2,d.G3)
d.merge 
GROUP VAR Freq       per
1    G1   A    2 0.6666667
4    G1   B    1 0.3333333
2    G2   A    1 0.5000000
5    G2   B    1 0.5000000
3    G3   A    0 0.0000000
6    G3   B    2 1.0000000

Is there a more elegant solution using for example the reshape2 package?


回答1:


With dplyrpackage you can do:

require(dplyr)

d <- d %>% group_by(GROUP) %>% mutate(per = Freq/sum(Freq))



回答2:


This answer is coming from a comment by @lukeA and I think it's a really elegant solution if you only need the percentages:

d<-as.data.frame(prop.table(table(d),1))



回答3:


Using data.table, you can do it as below:

library(data.table)
GROUP<-c("G1","G2","G1","G2","G3","G3","G1")
VAR<-c("A","B","B","A","B","B","A")
DT <-data.table(GROUP,VAR)

# Create count 
DT1 <- DT[, list(Count=.N), by=.(GROUP, VAR)]
# melt and dcast to get all combinations of GROUP and VAR
# as in your output. You can remove it if all combinations 
# not required
DT2 <- dcast(DT1, GROUP ~ VAR)
DT3 <- melt(DT2,  id.var="GROUP")
# Replace na values with zero
DT3[,lapply(.SD,function(x){ifelse(is.na(x),0,x)})]
# Create percentage
DT3[, percent:=value/sum(value, na.rm=TRUE), by=GROUP]

I tried to keep the output as your output. Hence had to do dcast and melt. These can be omitted if not required.



来源:https://stackoverflow.com/questions/35651058/r-calculating-grouped-frequency-table-with-percentage

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!