I need to colour datapoints that are outside of the the confidence bands on the plot below differently from those within the bands. Should I add a separate column to my dataset
Well, I thought that this would be pretty easy with ggplot2, but now I realize that I have no idea how the confidence limits for stat_smooth/geom_smooth are calculated.
Consider the following:
library(ggplot2)
pred <- as.data.frame(predict(severity.lm,level=0.95,interval="confidence"))
dat <- data.frame(diseasesev,temperature,
in_interval = diseasesev <=pred$upr & diseasesev >=pred$lwr ,pred)
ggplot(dat,aes(y=diseasesev,x=temperature)) +
stat_smooth(method='lm') + geom_point(aes(colour=in_interval)) +
geom_line(aes(y=lwr),colour=I('red')) + geom_line(aes(y=upr),colour=I('red'))
This produces: alt text http://ifellows.ucsd.edu/pmwiki/uploads/Main/strangeplot.jpg
I don't understand why the confidence band calculated by stat_smooth is inconsistent with the band calculated directly from predict (i.e. the red lines). Can anyone shed some light on this?
Edit:
figured it out. ggplot2 uses 1.96 * standard error to draw the intervals for all smoothing methods.
pred <- as.data.frame(predict(severity.lm,se.fit=TRUE,
level=0.95,interval="confidence"))
dat <- data.frame(diseasesev,temperature,
in_interval = diseasesev <=pred$fit.upr & diseasesev >=pred$fit.lwr ,pred)
ggplot(dat,aes(y=diseasesev,x=temperature)) +
stat_smooth(method='lm') +
geom_point(aes(colour=in_interval)) +
geom_line(aes(y=fit.lwr),colour=I('red')) +
geom_line(aes(y=fit.upr),colour=I('red')) +
geom_line(aes(y=fit.fit-1.96*se.fit),colour=I('green')) +
geom_line(aes(y=fit.fit+1.96*se.fit),colour=I('green'))