ggplot2: how to add sample numbers to density plot?

匿名 (未验证) 提交于 2019-12-03 01:34:02

问题:

I am trying to generate a (grouped) density plot labelled with sample sizes.

Sample data:

set.seed(100) df <- data.frame(ab.class = c(rep("A", 200), rep("B", 200)),                  val = c(rnorm(200, 0, 1), rnorm(200, 1, 1))) 

The unlabelled density plot is generated and looks as follows:

ggplot(df, aes(x = val, group = ab.class)) +   geom_density(aes(fill = ab.class), alpha = 0.4) 

What I want to do is add text labels somewhere near the peak of each density, showing the number of samples in each group. However, I cannot find the right combination of options to summarise the data in this way.

I tried to adapt the code suggested in this answer to a similar question on boxplots: https://stackoverflow.com/a/15720769/1836013

n_fun <- function(x){   return(data.frame(y = max(x), label = paste0("n = ",length(x)))) }  ggplot(df, aes(x = val, group = ab.class)) +   geom_density(aes(fill = ab.class), alpha = 0.4) +   stat_summary(geom = "text", fun.data = n_fun) 

However, this fails with Error: stat_summary requires the following missing aesthetics: y.

I also tried adding y = ..density.. within aes() for each of the geom_density() and stat_summary() layers, and in the ggplot() object itself... none of which solved the problem.

I know this could be achieved by manually adding labels for each group, but I was hoping for a solution that generalises, and e.g. allows the label colour to be set via aes() to match the densities.

Where am I going wrong?

回答1:

The y in the return of fun.data is not the aes. stat_summary complains that he cannot find y, which should be specificed in global settings at ggplot(df, aes(x = val, group = ab.class, y = or stat_summary(aes(y = if global setting of y is not available. The fun.data compute where to display point/text/... at each x based on y given in the data through aes. (I am not sure whether I have made this clear. Not a native English speaker).

Even if you have specified y through aes, you won't get desired results because stat_summary compute a y at each x.

However, you can add text to desired positions by geom_text or annotate:

# save the plot as p p <- ggplot(df, aes(x = val, group = ab.class)) +     geom_density(aes(fill = ab.class), alpha = 0.4)  # build the data displayed on the plot. p.data <- ggplot_build(p)$data[[1]]  # Note that column 'scaled' is used for plotting # so we extract the max density row for each group p.text <- lapply(split(p.data, f = p.data$group), function(df){     df[which.max(df$scaled), ] }) p.text <- do.call(rbind, p.text)  # we can also get p.text with dplyr.  # now add the text layer to the plot p + annotate('text', x = p.text$x, y = p.text$y,              label = sprintf('n = %d', p.text$n), vjust = 0) 



标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!