问题
I have a data set like this ->
library(ggplot2)
response <- c("Yes","No")
gend <- c("Female","Male")
purchase <- sample(response, 20, replace = TRUE)
gender <- sample(gend, 20, replace = TRUE)
df <- as.data.frame(purchase)
df <- cbind(df,gender)
so head(df)
looks like this ->
purchase gender
1 Yes Female
2 No Male
3 No Female
4 No Female
5 Yes Female
6 No Female
Also, so you can validate my examples, here is table(df)
for my particular sampling.
(please don't worry about matching my percentages)
gender
purchase Female Male
No 6 3
Yes 4 7
I want a "histogram" showing Gender, but split by Purchase. I have gone this far ->
ggplot(df) +
geom_bar(aes(y = (..count..)/sum(..count..)),position = "dodge") +
aes(gender, fill = purchase)
which generates ->
histogram with split bins, by percentage, but not the aggregate level I want
The Y axis has Percentage as I want, but it has each bar of the chart as a percentage of the whole chart.
What I want is the two "Female" bars to each be a percentage of there respective "Purchase". So in the chart above I would like four bars to be,
66%, 36%, 33%, 64%
, in that order.
I have tried with geom_histogram to no avail. I have checked SO, searched, ggplot documentation, and several books.
Regarding the suggestion to look at the previous question about facets; that does work, but I had hoped to keep the chart visually as it is above, as opposed to split into "two charts". So...
Anyone know how to do this?
Thanks.
回答1:
Regarding the percentages you want, is the denominator based on gender, or purchase? In the example given above, 66% for female & no purchase would be a result of 6 divided by the sum of no purchases (6+3) rather than the sum of all females (6+4).
It's definitely possible to plot that, but I'm not sure if the result would be intuitive to interpret. I got confused myself for a while.
The following hack makes use of the weight
aesthetic. I've used purchase as the grouping variable here based on the expected output described in the question, though I think gender makes more sense (as per TTNK's answer above):
df <- data.frame(purchase = c(rep("No", 6), rep("Yes", 4), rep("No", 3), rep("Yes", 7)),
gender = c(rep("Female", 10), rep("Male", 10)))
ggplot(df %>%
group_by(purchase) %>% #change this to gender if that's the intended denominator
mutate(w = 1/n()) %>% ungroup()) +
aes(gender, fill = purchase, weight = w)+
geom_bar(aes(x = gender, fill = purchase), position = "dodge")+
scale_y_continuous(name = "percent", labels = scales::percent)
回答2:
Try something like this:
library(tidyverse)
df %>%
count(purchase, gender) %>%
ungroup %>%
group_by(gender) %>%
mutate(prop = prop.table(n)) %>%
ggplot(aes(gender, prop, group = purchase)) +
geom_bar(aes(fill = purchase), stat = "identity", position = "dodge")
The first 5 lines create a column prop
(for "proportion"), which aggregates across gender
.
To get there, you first count
each purchase
by gender
(similar to the output of table(df)
. Ungrouping then regrouping only by gender
gives the aggregation we want.
来源:https://stackoverflow.com/questions/45744111/r-percentage-by-bin-in-histogram-ggplot