Remove outliers fully from multiple boxplots made with ggplot2 in R and display the boxplots in expanded format

坚强是说给别人听的谎言 提交于 2019-12-02 17:19:33

A minimal reproducible example:

library(ggplot2)
p <- ggplot(mtcars, aes(factor(cyl), mpg))
p + geom_boxplot()

Not plotting outliers:

p + geom_boxplot(outlier.shape=NA)
#Warning message:
#Removed 3 rows containing missing values (geom_point).

(I prefer to get this warning, because a year from now with a long script it would remind me that I did something special there. If you want to avoid it use Sven's solution.)

Based on suggestions by @Sven Hohenstein, @Roland and @lukeA I have solved the problem for displaying multiple boxplots in expanded form without outliers.

First plot the box plots without outliers by using outlier.colour=NA in geom_boxplot()

plt_wool <- ggplot(subset(df_mlt, value > 0), aes(x=ID1,y=value)) + 
  geom_boxplot(aes(color=factor(ID1)),outlier.colour = NA) +
  scale_y_log10(breaks = trans_breaks("log10", function(x) 10^x), labels = trans_format("log10", math_format(10^.x))) +
  theme_bw() +
  theme(legend.text=element_text(size=14), legend.title=element_text(size=14))+
  theme(axis.text=element_text(size=20)) +
  theme(axis.title=element_text(size=20,face="bold")) +
  labs(x = "x", y = "y",colour="legend" ) +
  annotation_logticks(sides = "rl") +
  theme(panel.grid.minor = element_blank()) +
  guides(title.hjust=0.5) +
  theme(plot.margin=unit(c(0,1,0,0),"mm"))

Then compute the lower, upper whiskers using boxplot.stats() as the code below. Since I only take into account positive values, I choose them using the condition in the subset().

yp <- subset(df, x>0)             # Choosing only +ve values in col x
sts <- boxplot.stats(yp$x)$stats  # Compute lower and upper whisker limits

Now to achieve full expanded view of the multiple boxplots, it is useful to modify the y-axis limit of the plot inside coord_cartesian() function as below,

p1 = plt_wool + coord_cartesian(ylim = c(sts[2]/2,max(sts)*1.05))

Note: The limits of y should be adjusted according to the specific case. In this case I have chosen half of lower whisker limit for ymin.

The resulting plot is below,

You can make the outliers invisible with the argument outlier.colour = NA:

geom_boxplot(aes(color = factor(ID1)), outlier.colour = NA)
ggplot(df_mlt, aes(x = ID1, y = value)) + 
  geom_boxplot(outlier.size = NA) + 
  coord_cartesian(ylim = range(boxplot(df_mlt$value, plot=FALSE)$stats)*c(.9, 1.1))

Another way to exclude outliers is to calculate them then set the y-limit on what you consider an outlier.

For example, if your upper and lower limits are Q3 + 1.5 IQR and Q1 - 1.5 IQR, then you may use:

upper.limit <- quantile(x)[4] + 1.5*IQR(x)
lower.limit <- quantile(x)[2] - 1.5*IQR(x)

Then put limits on the y-axis range:

ggplot + coord_cartesian(ylim=c(lower.limit, upper.limit))
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!