How to add a number of observations per group and use group mean in ggplot2 boxplot?

我与影子孤独终老i 提交于 2019-11-26 09:38:01

问题


I am doing a basic boxplot where y=age and x=Patient groups

age <- ggplot(data, aes(factor(group2), age))  + ylim(15, 80) 
age + geom_boxplot(fill = \"grey80\", colour = \"#3366FF\")

I was hoping you could help me out with a few things:

1) Is it possible to include a number of observations per group above each group boxplot (but NOT on the X axis where my group labels are) without having to do this in paint :)? I have tried using:

age + annotate(\"text\", x = \"CON\", y = 60, label = \"25\")

where CON is the 1st group and y = 60 is ~ just above the boxplot for this group. However, the command didn\'t work. I assume it has something to do that it reads x as a continuous rather than a categorical variable.

2) Also although there are plenty of questions about using the mean rather than the median for the boxplots, I still haven`t found a code that works for me?

3) On the same matter is there a way you could include the mean group stat in the boxplot? Perhaps using

age + stat_summary(fun.y=mean, colour=\"red\", geom=\"point\")

which however only includes a dot of where the mean lies. Or again using

age + annotate(\"text\", x = \"CON\", y = 30, label = \"30\")

where CON is the 1st group and y = 30 is ~ the group age mean. Knowing how flexible and rich ggplot2 syntax is I was hoping that there is a more elegant way of using the real stats output rather than annotate.

Any suggestions/links would be much appreciated!

Thanks!!


回答1:


Is this anything like what you're after? With stat_summary, as requested:

# function for number of observations 
give.n <- function(x){
  return(c(y = median(x)*1.05, label = length(x))) 
  # experiment with the multiplier to find the perfect position
}

# function for mean labels
mean.n <- function(x){
  return(c(y = median(x)*0.97, label = round(mean(x),2))) 
  # experiment with the multiplier to find the perfect position
}

# plot
ggplot(mtcars, aes(factor(cyl), mpg, label=rownames(mtcars))) +
  geom_boxplot(fill = "grey80", colour = "#3366FF") +
  stat_summary(fun.data = give.n, geom = "text", fun.y = median) +
  stat_summary(fun.data = mean.n, geom = "text", fun.y = mean, colour = "red")

Black number is number of observations, red number is mean value. joran's answer shows you how to put the numbers at the top of the boxes

hat-tip: https://stackoverflow.com/a/3483657/1036500




回答2:


I think this is what you're looking for maybe?

myboxplot <- ddply(mtcars,
                    .(cyl),
                    summarise,
                    min = min(mpg),
                    q1 = quantile(mpg,0.25),
                    med = median(mpg),
                    q3 = quantile(mpg,0.75),
                    max= max(mpg),
                    lab = length(cyl))
ggplot(myboxplot, aes(x = factor(cyl))) + 
    geom_boxplot(aes(lower = q1, upper = q3, middle = med, ymin = min, ymax = max), stat = "identity") + 
    geom_text(aes(y = max,label = lab),vjust = 0)

I just realized I mistakenly used the median when you were asking about the mean, but you can obviously use whatever function for the middle aesthetic you please.




回答3:


Answer to the first problem. To show value above the box you should provide x values as numeric not as level names. So, to plot the value above first value give x=1.

data(ToothGrowth)
ggplot(ToothGrowth,aes(supp,len))+geom_boxplot()+
   annotate("text",x=1,y=32,label=30)


来源:https://stackoverflow.com/questions/15660829/how-to-add-a-number-of-observations-per-group-and-use-group-mean-in-ggplot2-boxp

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!