I have a scatter plot,I want to know how can I find the genes above and below the confidence interval lines?
EDIT: Reproducible examp
This solution takes advantage of the hard work ggplot2 does for you:
library(sp)
# we have to build the plot first so ggplot can do the calculations
ggplot(df,aes(mpg,cyl)) +
geom_point() +
geom_smooth() -> gg
# do the calculations
gb <- ggplot_build(gg)
# get the CI data
p <- gb$data[[2]]
# make a polygon out of it
poly <- data.frame(
x=c(p$x[1], p$x, p$x[length(p$x)], rev(p$x)),
y=c(p$ymax[1], p$ymin, p$ymax[length(p$x)], rev(p$ymax))
)
# test for original values in said polygon and add that to orig data
# so we can color by it
df$in_ci <- point.in.polygon(df$mpg, df$cyl, poly$x, poly$y)
# re-do the plot with the new data
ggplot(df,aes(mpg,cyl)) +
geom_point(aes(color=factor(in_ci))) +
geom_smooth()
It needs a bit of tweaking (i.e that last point getting a 2
value) but I'm limited on time. NOTE that the point.in.polygon
return values are:
0
: point is strictly exterior to pol1
: point is strictly interior to pol2
: point lies on the relative interior of an edge of pol3
: point is a vertex of polso it should be easy to just change the code to TRUE
/FALSE
whether value is 0
or not.