ggplot - Add regression line on a boxplot with binned (non-continuous) x-axis

老子叫甜甜 提交于 2019-11-28 12:38:58

问题


I have a dataset with this structure:

df<- data.frame (VPD.mean=rnorm(100,mean=2,sd=0.8), treatment=c("ambient","elevated"), variable=rnorm(100,mean=50,sd=10))
df$group <- with(df, as.factor (ifelse (VPD.mean>0 & VPD.mean<=1,"0-1",ifelse (
  VPD.mean>1 & VPD.mean<=1.5,"1-1.5",ifelse (
    VPD.mean >1.5 & VPD.mean<2, "1.5-2",ifelse (
      VPD.mean >=2 & VPD.mean<2.5, "2-2.5",ifelse (
        VPD.mean >=2.5 & VPD.mean <3,"2.5-3", ifelse(
          VPD.mean >=3,">3", NA)  
      )))))))
df$group<- factor(df$group,levels=c("0-1","1-1.5","1.5-2" ,"2-2.5","2.5-3",">3"))

I created a boxplot using the groups created after binning VPD.mean, and therefore the x-axis is non-continuous (see graph below):

I would also like to add a regression line (smooth), and therefore I would have to use the continuous variable (VPD.mean) instead of the binned one (groups) as x-axis. The result is not nice, because the smooth line doesn't match the x-axis of the graphs. This is the code for the ggplot:

ggplot(df[!is.na(df$group),], aes(group,variable,fill=treatment)) + 
  geom_boxplot(outlier.size = 0) + geom_smooth(aes(x=VPD.mean)) 

What's the solution to plot the geom_smooth from a different x-axis on the same graph? Thanks


回答1:


It is possible to do what you ask, but it is a stunningly bad idea.

set.seed(1)  # for reproducible example
df<- data.frame (VPD.mean=rnorm(100,mean=2,sd=0.8), treatment=c("ambient","elevated"), variable=rnorm(100,mean=50,sd=10))
df$group <- cut(df$VPD.mean,
                breaks=c(0,seq(1,3,by=0.5),Inf), 
                labels=c("0-1","1-1.5","1.5-2","2-2.5","2.5-3",">3"))
library(ggplot2)
ggplot(df[!is.na(df$group),]) +
  geom_boxplot(aes(x=factor(group),y=variable,fill=treatment),
               position=position_dodge(.7),width=.8)+
  geom_smooth(aes(x=as.integer(group),y=variable,color=treatment,fill=treatment),method=loess)

This works, more or less, because ggplot uses the factor codes for the x-axis, and the factor levels for the axis labels. as.integer(group) returns the factor codes. If your bins are not all the same size (and they are not, in your case), the plot can be misleading.



来源:https://stackoverflow.com/questions/21296789/ggplot-add-regression-line-on-a-boxplot-with-binned-non-continuous-x-axis

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!