Show % instead of counts in charts of categorical variables

匿名 (未验证) 提交于 2019-12-03 01:48:02

问题:

I'm plotting a categorical variable and instead of showing the counts for each category value.

I'm looking for a way to get ggplot to display the percentage of values in that category. Of course, it is possible to create another variable with the calculated percentage and plot that one, but I have to do it several dozens of times and I hope to achieve that in one command.

I was experimenting with something like

qplot(mydataf) +   stat_bin(aes(n = nrow(mydataf), y = ..count../n)) +   scale_y_continuous(formatter = "percent") 

but I must be using it incorrectly, as I got errors.

To easily reproduce the setup, here's a simplified example:

mydata 

In the real case I'll probably use ggplot instead of qplot, but the right way to use stat_bin still eludes me.

I've also tried these four approaches:

ggplot(mydataf, aes(y = (..count..)/sum(..count..))) +    scale_y_continuous(formatter = 'percent');  ggplot(mydataf, aes(y = (..count..)/sum(..count..))) +    scale_y_continuous(formatter = 'percent') + geom_bar();  ggplot(mydataf, aes(x = levels(mydataf), y = (..count..)/sum(..count..))) +    scale_y_continuous(formatter = 'percent');  ggplot(mydataf, aes(x = levels(mydataf), y = (..count..)/sum(..count..))) +    scale_y_continuous(formatter = 'percent') + geom_bar(); 

but all 4 give:

Error: ggplot2 doesn't know how to deal with data of class factor 

The same error appears for the simple case of

ggplot (data=mydataf, aes(levels(mydataf))) +   geom_bar() 

so it's clearly something about how ggplot interacts with a single vector. I'm scratching my head, googling for that error gives a single result.

回答1:

Since this was answered there have been some meaningful changes to the ggplot syntax. Summing up the discussion in the comments above:

 require(ggplot2)  require(scales)   p 

Here's a reproducible example using mtcars:

 ggplot(mtcars, aes(x = factor(hp))) +           geom_bar(aes(y = (..count..)/sum(..count..))) +          ## scale_y_continuous(labels = percent_format()) #version 3.0.9         scale_y_continuous(labels = percent) #version 3.1.0 

This question is currently the #1 hit on google for 'ggplot count vs percentage histogram' so hopefully this helps distill all the information currently housed in comments on the accepted answer.

Remark: If hp is not set as a factor, ggplot returns:



回答2:

this modified code should work

p = ggplot(mydataf, aes(x = foo)) +      geom_bar(aes(y = (..count..)/sum(..count..))) +      scale_y_continuous(formatter = 'percent') 

if your data has NAs and you dont want them to be included in the plot, pass na.omit(mydataf) as the argument to ggplot.

hope this helps.



回答3:

With ggplot2 version 2.1.0 it is

+ scale_y_continuous(labels = scales::percent) 


回答4:

As of March 2017, with ggplot2 2.2.1 I think the best solution is explained in Hadley Wickham's R for data science book:

ggplot(mydataf) + stat_count(mapping = aes(x=foo, y=..prop.., group=1)) 

stat_count computes two variables: count is used by default, but you can choose to use prop which shows proportions.



回答5:

If you want percentages on the y-axis and labeled on the bars:

library(ggplot2) library(scales) ggplot(mtcars, aes(x = as.factor(am))) +   geom_bar(aes(y = (..count..)/sum(..count..))) +   geom_text(aes(y = ((..count..)/sum(..count..)), label = scales::percent((..count..)/sum(..count..))), stat = "count", vjust = -0.25) +   scale_y_continuous(labels = percent) +   labs(title = "Manual vs. Automatic Frequency", y = "Percent", x = "Automatic Transmission") 

When adding the bar labels, you may wish to omit the y-axis for a cleaner chart, by adding to the end:

  theme(         axis.text.y=element_blank(), axis.ticks=element_blank(),         axis.title.y=element_blank()   ) 



回答6:

If you want percentage labels but actual Ns on the y axis, try this:

    library(scales) perbar=function(xx){       q=ggplot(data=data.frame(xx),aes(x=xx))+       geom_bar(aes(y = (..count..)),fill="orange")        q=q+    geom_text(aes(y = (..count..),label = scales::percent((..count..)/sum(..count..))), stat="bin",colour="darkgreen")        q     }     perbar(mtcars$disp) 


回答7:

Here is a workaround for facteted data. (The accepted answer by @Andrew does not work in this case.) The idea is to calculate the percentage value using dplyr and then to use geom_col to create the plot.

library(ggplot2) library(scales) library(magrittr) library(dplyr)  binwidth %   group_by(cyl) %>%   mutate(bin = cut(hp, breaks=seq(0,400, binwidth),                 labels= seq(0+binwidth,400, binwidth)-(binwidth/2)),          n = n()) %>%   group_by(cyl, bin) %>%   summarise(p = n()/n[1]) %>%   ungroup() %>%   mutate(bin = as.numeric(as.character(bin)))  ggplot(mtcars.stats, aes(x = bin, y= p)) +     geom_col() +    scale_y_continuous(labels = percent) +   facet_grid(cyl~.) 

This is the plot:



标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!