问题
Giving the following data.frame, I would like to calculate the occurance of each variable of VAR and the percentage of these occurence by the grouping variable GROUP:
GROUP<-c("G1","G2","G1","G2","G3","G3","G1")
VAR<-c("A","B","B","A","B","B","A")
d<-data.frame(GROUP,VAR)
With table(), I get a nice frequency table, counting the occurences of all combinations of the two variables:
d<-as.data.frame(table(d))
GROUP VAR Freq
1 G1 A 2
2 G2 A 1
3 G3 A 0
4 G1 B 1
5 G2 B 1
6 G3 B 2
Now I would like to calculate the percentage of each variable for VAR by GROUP. So far I'm splitting the data.frame by GROUP and calculate the percentage seperately for G1, G2 and G3 and merging the afterwards.
d.G1<-d[d$GROUP=="G1",]
d.G1$per<-d.G1$Freq/sum(d.G1$Freq)
d.G1
GROUP VAR Freq per
1 G1 A 2 0.6666667
4 G1 B 1 0.3333333
...
d.merge<-rbind(d.G1,d.G2,d.G3)
d.merge
GROUP VAR Freq per
1 G1 A 2 0.6666667
4 G1 B 1 0.3333333
2 G2 A 1 0.5000000
5 G2 B 1 0.5000000
3 G3 A 0 0.0000000
6 G3 B 2 1.0000000
Is there a more elegant solution using for example the reshape2 package?
回答1:
With dplyrpackage you can do:
require(dplyr)
d <- d %>% group_by(GROUP) %>% mutate(per = Freq/sum(Freq))
回答2:
This answer is coming from a comment by @lukeA and I think it's a really elegant solution if you only need the percentages:
d<-as.data.frame(prop.table(table(d),1))
回答3:
Using data.table, you can do it as below:
library(data.table)
GROUP<-c("G1","G2","G1","G2","G3","G3","G1")
VAR<-c("A","B","B","A","B","B","A")
DT <-data.table(GROUP,VAR)
# Create count
DT1 <- DT[, list(Count=.N), by=.(GROUP, VAR)]
# melt and dcast to get all combinations of GROUP and VAR
# as in your output. You can remove it if all combinations
# not required
DT2 <- dcast(DT1, GROUP ~ VAR)
DT3 <- melt(DT2, id.var="GROUP")
# Replace na values with zero
DT3[,lapply(.SD,function(x){ifelse(is.na(x),0,x)})]
# Create percentage
DT3[, percent:=value/sum(value, na.rm=TRUE), by=GROUP]
I tried to keep the output as your output. Hence had to do dcast and melt. These can be omitted if not required.
来源:https://stackoverflow.com/questions/35651058/r-calculating-grouped-frequency-table-with-percentage