Find points over and under the confidence interval when using geom_stat / geom_smooth in ggplot2

后端 未结 3 1769
轮回少年
轮回少年 2020-12-17 03:19

I have a scatter plot,I want to know how can I find the genes above and below the confidence interval lines?


EDIT: Reproducible examp

3条回答
  •  南方客
    南方客 (楼主)
    2020-12-17 03:47

    This solution takes advantage of the hard work ggplot2 does for you:

    library(sp)
    
    # we have to build the plot first so ggplot can do the calculations
    ggplot(df,aes(mpg,cyl)) +
      geom_point() +
      geom_smooth() -> gg
    
    # do the calculations
    gb <- ggplot_build(gg)
    
    # get the CI data
    p <- gb$data[[2]]
    
    # make a polygon out of it
    poly <- data.frame(
      x=c(p$x[1],    p$x,    p$x[length(p$x)],    rev(p$x)), 
      y=c(p$ymax[1], p$ymin, p$ymax[length(p$x)], rev(p$ymax))
    )
    
    # test for original values in said polygon and add that to orig data
    # so we can color by it
    df$in_ci <- point.in.polygon(df$mpg, df$cyl, poly$x, poly$y)
    
    # re-do the plot with the new data
    ggplot(df,aes(mpg,cyl)) +
      geom_point(aes(color=factor(in_ci))) +
      geom_smooth()
    

    It needs a bit of tweaking (i.e that last point getting a 2 value) but I'm limited on time. NOTE that the point.in.polygon return values are:

    • 0: point is strictly exterior to pol
    • 1: point is strictly interior to pol
    • 2: point lies on the relative interior of an edge of pol
    • 3: point is a vertex of pol

    so it should be easy to just change the code to TRUE/FALSE whether value is 0 or not.

提交回复
热议问题