r percentage by bin in histogram ggplot

房东的猫 提交于 2020-06-26 06:19:20

问题


I have a data set like this ->

library(ggplot2)

response <- c("Yes","No")
gend <- c("Female","Male")

purchase <- sample(response, 20, replace = TRUE)
gender <- sample(gend, 20, replace = TRUE)

df <- as.data.frame(purchase)
df <- cbind(df,gender)

so head(df) looks like this ->

  purchase gender
1      Yes Female
2       No   Male
3       No Female
4       No Female
5      Yes Female
6       No Female

Also, so you can validate my examples, here is table(df) for my particular sampling.
(please don't worry about matching my percentages)

         gender
purchase Female Male
     No       6    3
     Yes      4    7

I want a "histogram" showing Gender, but split by Purchase. I have gone this far ->

ggplot(df) + 
       geom_bar(aes(y = (..count..)/sum(..count..)),position = "dodge") + 
       aes(gender, fill = purchase)

which generates ->

histogram with split bins, by percentage, but not the aggregate level I want histogram with split bins, by percentage, but not the aggregate level I want

The Y axis has Percentage as I want, but it has each bar of the chart as a percentage of the whole chart. What I want is the two "Female" bars to each be a percentage of there respective "Purchase". So in the chart above I would like four bars to be, 66%, 36%, 33%, 64% , in that order.

I have tried with geom_histogram to no avail. I have checked SO, searched, ggplot documentation, and several books.

Regarding the suggestion to look at the previous question about facets; that does work, but I had hoped to keep the chart visually as it is above, as opposed to split into "two charts". So...

Anyone know how to do this?

Thanks.


回答1:


Regarding the percentages you want, is the denominator based on gender, or purchase? In the example given above, 66% for female & no purchase would be a result of 6 divided by the sum of no purchases (6+3) rather than the sum of all females (6+4).

It's definitely possible to plot that, but I'm not sure if the result would be intuitive to interpret. I got confused myself for a while.

The following hack makes use of the weight aesthetic. I've used purchase as the grouping variable here based on the expected output described in the question, though I think gender makes more sense (as per TTNK's answer above):

df <- data.frame(purchase = c(rep("No", 6), rep("Yes", 4), rep("No", 3), rep("Yes", 7)),
                 gender = c(rep("Female", 10), rep("Male", 10)))

ggplot(df %>% 
         group_by(purchase) %>% #change this to gender if that's the intended denominator
         mutate(w = 1/n()) %>% ungroup()) + 
  aes(gender, fill = purchase, weight = w)+ 
  geom_bar(aes(x = gender, fill = purchase), position = "dodge")+
  scale_y_continuous(name = "percent", labels = scales::percent)




回答2:


Try something like this:

library(tidyverse)

df %>% 
count(purchase, gender) %>% 
ungroup %>% 
group_by(gender) %>% 
mutate(prop = prop.table(n)) %>% 
ggplot(aes(gender, prop, group = purchase)) + 
geom_bar(aes(fill = purchase), stat = "identity", position = "dodge")

The first 5 lines create a column prop (for "proportion"), which aggregates across gender.

To get there, you first count each purchase by gender (similar to the output of table(df). Ungrouping then regrouping only by gender gives the aggregation we want.



来源:https://stackoverflow.com/questions/45744111/r-percentage-by-bin-in-histogram-ggplot

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!