Barplot with ggplot 2 of two categorical variable facet_wrap according a third variable displayng percentage

醉酒当歌 提交于 2021-02-07 10:14:17


I would like to barplot in ggplot2 a categorical variable grouped according a second categorical variable and use facet_wrap to divide them in different plots. Than I would show percentage of each. Here a reproducible example

test <- data.frame(
  test1 = sample(letters[1:2], 100, replace = TRUE), 
  test2 = sample(letters[3:5], 100, replace = TRUE),
  test3 = sample(letters[9:11],100, replace = TRUE )

ggplot(test, aes(x=factor(test1))) +
  geom_bar(aes(fill=factor(test2), y=..prop.., group=factor(test2)), position="dodge") +
  scale_y_continuous("Percentage (%)", limits = c(0, 1), breaks = seq(0, 1, by=0.1), labels = percent)+
  theme(plot.title = element_text(hjust = 0.5), panel.grid.major.x = element_blank())

This give me a barplot with the percentage of test2 according test1 in each test3. I would like to show the percentage of each bar on the top. Moreover, I would like to change the name of the legend in the right from factor(test2) in Test2.

It may be easiest to do the data summary yourself so that you can create a column with the percentage labels you want. (Note that as is, I'm not sure what you want your percentages to show- in facet i, group b, there is a column that is nearly 90%, and two columns that are greater than or equal to 50%- is that intended?)

Libraries and your example data frame:


First, group by all columns (note the order), then summarize to get the length of test2. Mutate to get a value for the column height and label- here I've multiplied by 100 and rounded.

test.grouped <- test %>%
  group_by(test1, test3, test2) %>%
  summarize(t2.len = length(test2)) %>%
  mutate(t2.prop = round(t2.len / sum(t2.len) * 100, 1))

> test.grouped
# A tibble: 18 x 5
# Groups:   test1, test3 [6]
    test1  test3  test2 t2.len t2.prop
   <fctr> <fctr> <fctr>  <int>   <dbl>
 1      a      i      c      4    30.8
 2      a      i      d      5    38.5
 3      a      i      e      4    30.8
 4      a      j      c      3    20.0
 5      a      j      d      8    53.3

Use the summarized data to build your plot, using geom_text to use the proportion column as the label:

ggplot(test.grouped, aes(x = test1, 
                         y = t2.prop, 
                         fill = test2, 
                         group = test2)) +  
  geom_bar(stat = "identity", position = position_dodge(width = 0.9)) +
  geom_text(aes(label = paste(t2.prop, "%", sep = ""), 
                group = test2), 
            position = position_dodge(width = 0.9),
            vjust = -0.8)+
  facet_wrap(~ test3) + 
  scale_y_continuous("Percentage (%)") +
  scale_x_discrete("") + 
  theme(plot.title = element_text(hjust = 0.5), panel.grid.major.x = element_blank())

