jitter if multiple outliers in ggplot2 boxplot

前端 未结 5 1486
灰色年华
灰色年华 2020-12-08 06:31

I am trying to find a suitable display to illustrate various properties within and across school classes. For each class there is only 15-30 data points (pupils).

Ri

5条回答
  •  广开言路
    2020-12-08 06:33

    It seems like the accepted answer doesn't work anymore, since ggplot2 has been updated. After much search on the net I found the following on: http://comments.gmane.org/gmane.comp.lang.r.ggplot2/3616 -Look at Winston Chang's reply-

    He calculates the outliers separately using ddply and then plotts them using

    geom_dotplot()
    

    having disabled the outlier output on the geom_boxplot():

     geom_boxplot(outlier.colour = NA) 
    

    Here is the full code from the URL mentioned above:

    # This returns a data frame with the outliers only
    find_outliers <- function(y, coef = 1.5) {
       qs <- c(0, 0.25, 0.5, 0.75, 1)
       stats <- as.numeric(quantile(y, qs))
       iqr <- diff(stats[c(2, 4)])
    
       outliers <- y < (stats[2] - coef * iqr) | y > (stats[4] + coef * iqr)
    
       return(y[outliers])
    }
    
    
    library(MASS)  # Use the birthwt data set from MASS
    
    # Find the outliers for each level of 'smoke'
    library(plyr)
    outlier_data <- ddply(birthwt, .(smoke), summarise, lwt = find_outliers(lwt))
    
    
    # This draws an ordinary box plot
    ggplot(birthwt, aes(x = factor(smoke), y = lwt)) + geom_boxplot()
    
    
    # This draws the outliers using geom_dotplot
    ggplot(birthwt, aes(x = factor(smoke), y = lwt)) +
       geom_boxplot(outlier.colour = NA) +
    #also consider:
    #  geom_jitter(alpha = 0.5, size = 2)+
       geom_dotplot(data = outlier_data, binaxis = "y",
                    stackdir = "center", binwidth = 4)
    

提交回复
热议问题