ggplot2 boxplot medians aren't plotting as expected

谁说我不能喝 提交于 2019-11-29 19:14:20

问题


So, I have a fairly large dataset (Dropbox: csv file) that I'm trying to plot using geom_boxplot. The following produces what appears to be a reasonable plot:

require(reshape2)
require(ggplot2)
require(scales)
require(grid)
require(gridExtra)

df <- read.csv("\\Downloads\\boxplot.csv", na.strings = "*")
df$year <- factor(df$year, levels = c(2010,2011,2012,2013,2014), labels = c(2010,2011,2012,2013,2014))

d <- ggplot(data = df, aes(x = year, y = value)) +
    geom_boxplot(aes(fill = station)) + 
    facet_grid(station~.) +
    scale_y_continuous(limits = c(0, 15)) + 
    theme(legend.position = "none"))
d

However, when you dig a little deeper, problems creep in that freak me out. When I labeled the boxplot medians with their values, the following plot results.

df.m <- aggregate(value~year+station, data = df, FUN = function(x) median(x))
d <- d + geom_text(data = df.m, aes(x = year, y = value, label = value)) 
d

The medians plotted by geom_boxplot aren't at the medians at all. The labels are plotted at the correct y-axis value, but the middle hinge of the boxplots are definitely not at the medians. I've been stumped by this for a few days now.

What is the reason for this? How can this type of display be produced with correct medians? How can this plot be debugged or diagnosed?


回答1:


The solution to this question is in the application of scale_y_continuous. ggplot2 will perform operations in the following order:

  1. Scale Transformations
  2. Statistical Computations
  3. Coordinate Transformations

In this case, because a scale transformation is invoked, ggplot2 excludes data outside the scale limits for the statistical computation of the boxplot hinges. The medians calculated by the aggregate function and used in the geom_text instruction will use the entire dataset, however. This can result in different median hinges and text labels.

The solution is to omit the scale_y_continuous instruction and instead use:

d <- ggplot(data = df, aes(x = year, y = value)) +
geom_boxplot(aes(fill = station)) + 
facet_grid(station~.) +
theme(legend.position = "none")) +
coord_cartesian(y = c(0,15))

This allows ggplot2 to calculate the boxplot hinge stats using the entire dataset, while limiting the plot size of the figure.



来源:https://stackoverflow.com/questions/29304712/ggplot2-boxplot-medians-arent-plotting-as-expected

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!