Problems displaying LOESS regression line and confidence interval

给你一囗甜甜゛ 提交于 2019-12-04 02:13:51

问题


I am having some issues trying to compete a LOESS regression with a data set. I have been able to properly create the line, but I am unable to get it to plot correctly.

I ran through the data like this.

animals.lo <- loess(X15p5 ~ Period, animals, weights = n.15p5)    
animals.lo    
summary(animals.lo)    
plot(X15p5~ Period, animals)    
lines(animals$X15p5, animals.lo, col="red")  

At this point I received an error

"Error in xy.coords(x, y) : 'x' and 'y' lengths differ"

I searched around and read that this issue could be due to the points needing to be ordered, so I proceeded.

a <- order(animals$Period)    
lines(animals$X15p5[a], animals.lo$Period[a], col="red", lwd=3)  

There were no errors at this point, but the LOESS line was still not showing up in the plot. The points were displayed correctly, but not the line.

This is similar to the data set I am using...

structure(list(Site = c("Cat", "Dog", "Bear", "Chicken", "Cow",
"Bird", "Tiger", "Lion", "Leopard", "Wolf", "Puppy", "Kitten", 
"Emu", "Ostrich", "Elephant", "Sheep", "Goat", "Fish", "Iguana", 
"Monkey", "Gorilla", "Baboon", "Lemming", "Mouse", "Rat", "Hamster", 
"Eagle", "Parrot", "Crow", "Dove", "Falcon", "Hawk", "Sparrow", 
"Kite", "Chimpanzee", "Giraffe", "Bear", "Donkey", "Mule", "Horse", 
"Zebra", "Ox", "Snake", "Cobra", "Iguana", "Lizard", "Fly", "Mosquito", 
"Llama", "Butterfly", "Moth", "Worm", "Centipede", "Unicorn", 
"Pegasus", "Griffin", "Ogre", "Monster", "Demon", "Witch", "Vampire", 
"Mummy", "Ghoul", "Zombie"), Region = c(6L, 4L, 4L, 5L, 7L, 6L, 
2L, 4L, 6L, 7L, 7L, 4L, 6L, 4L, 4L, 4L, 4L, 3L, 4L, 4L, 4L, 4L, 
4L, 4L, 4L, 4L, 3L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 8L, 4L, 6L, 6L, 
4L, 2L, 7L, 4L, 2L, 2L, 7L, 3L, 4L, 7L, 4L, 4L, 4L, 7L, 7L, 4L, 
4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 2L, 8L), Period = c(-2715, -3500, 
-3500, -4933.333333, -2715, -2715, -2715, -3500, -2715, -4350, 
-3500, -3500, -2950, -4350, -3650, -3500, -3500, -2715, -3650, 
-4350, -3500, -3500, -3400, -4350, -3500, -3500, -4350, -3900, 
-3808.333333, -4233.333333, -3500, -3900, -3958.333333, -3900, 
-3500, -3500, -3500, -2715, -3650, -2715, -2715, -2715, -2715, 
-3500, -2715, -2715, -3500, -4350, -3650, -3650, -4350, -5400, 
-3500, -3958.333333, -3400, -3400, -4350, -3600, -4350, -3650, 
-3500, -2715, -5400, -3500), Value = c(0.132625995, 0.163120567, 
0.228840125, 0.154931973, 0.110047847, 0.054347826, 0.188679245, 
0.245014245, 0.128378378, 0.021428571, 0.226277372, 0.176923077, 
0.104938272, 0.17659805, 0.143798024, 0.086956522, 0.0625, 0.160714286, 
0, 0.235588972, 0, 0, 0.208333333, 0.202247191, 0.364705882, 
0.174757282, 0, 0.4, 0.1, 0.184027778, 0.232876712, 0.160493827, 
0.74702381, 0.126984127, 0.080645161, 0.06557377, 0, 0.057692308, 
0.285714286, 0.489361702, 0.108695652, 0.377777778, 0, 0.522727273, 
0.024390244, 0.097560976, 0.275, 0, 0.0625, 0.255319149, 0.135135135, 
0.216216216, 0.222222222, 0.296296296, 0.222222222, 0.146341463, 
0.09375, 0.125, 0.041666667, 0.078947368, 0.2, 0.137931034, 0.571428571, 
0.142857143), Sample_size = c(188.5, 105.75, 79.75, 70, 52.25, 
46, 39.75, 39, 37, 35, 34.25, 32.5, 32.4, 30.76666667, 30.36666667, 
28.75, 28, 28, 28, 26.6, 25, 25, 24, 22.25, 21.25, 20.6, 20, 
20, 20, 19.2, 18.25, 18, 18, 16.8, 15.5, 15.25, 15, 13, 12.6, 
11.75, 11.5, 11.25, 11, 11, 10.25, 10.25, 10, 10, 9.6, 9.4, 9.25, 
9.25, 9, 9, 9, 8.2, 8, 8, 8, 7.6, 7.5, 7.25, 7, 7), Sample_sub = c(25, 
17.25, 18.25, 10.8452381, 5.75, 2.5, 7.5, 9.555555556, 4.75, 
0.75, 7.75, 5.75, 3.4, 5.433333333, 4.366666667, 2.5, 1.75, 4.5, 
0, 6.266666667, 0, 0, 5, 4.5, 7.75, 3.6, 0, 8, 2, 3.533333333, 
4.25, 2.888888889, 13.44642857, 2.133333333, 1.25, 1, 0, 0.75, 
3.6, 5.75, 1.25, 4.25, 0, 5.75, 0.25, 1, 2.75, 0, 0.6, 2.4, 1.25, 
2, 2, 2.666666667, 2, 1.2, 0.75, 1, 0.333333333, 0.6, 1.5, 1, 
4, 1)), .Names = c("Site", "Region", "Period", "Value", "Sample_size", 
"Sample_sub"), class = "data.frame", row.names = c(NA, -64L))

I have been working for this a while and trying to read up as much as I can, but I haven't been able to make any additional headway. Any advice or guidance would be greatly appreciated.


Follow-up on adding confidence interval to plot

I have been trying to add in confidence intervals following another example found on the site on this page How to get the confidence intervals for LOWESS fit using R? .

The example given on that page is:

plot(cars)
plx<-predict(loess(cars$dist ~ cars$speed), se=T)

lines(cars$speed,plx$fit)
lines(cars$speed,plx$fit - qt(0.975,plx$df)*plx$se, lty=2)
lines(cars$speed,plx$fit + qt(0.975,plx$df)*plx$se, lty=2)  

I adapted that as this:

plot(X15p5 ~ Period, animals)
animals.lo2<-predict(loess(animals$X15p5 ~ animals$Period), se=T)
a <- order(animals$Period)
lines(animals$Period[a],animals.lo2$fit, col="red", lwd=3)
lines(animals$Period[a],animals.lo2$fit - qt(0.975,animals.lo2$df)*animals.lo2$se, lty=2)
lines(animals$Period[a],animals.lo2$fit + qt(0.975,animals.lo2$df)*animals.lo2$se, lty=2)

Although this does provide confidence intervals, the regression line is all wrong. I'm not sure if it is an issue with the predict function, or another issue. Thanks again!


回答1:


correct code

I searched around and read that this issue could be due to the points needing to be ordered, so I proceeded.

No, no. The ordering issue is not related to the error you see. To overcome the error, You need to replace

lines(animals$X15p5, animals.lo, col="red") 

with

lines(animals$Period, animals.lo$fitted, col="red") 

Here are reasons:

  1. loess returns a list of objects, not a single vector. See str(animals.lo) or names(animals.lo).
  2. why do you use animals$X15p5 as x-axis? You fit your model: X15p5 ~ Period, so x-axis should be Period.

about reordering

You need to do ordering, because by default, R lines up points in order. Take this as an example:

set.seed(0); x <- runif(100, 0, 10)  ## x is not in order
set.seed(1); y <- sqrt(x)  ## plot curve y = sqrt(x)
par(mfrow = c(1,2))
plot(x, y, type = "l")  ## this is a mess!!
reorder <- order(x)
plot(x[reorder], y[reorder], type = "l")  ## this is nice

Similarly, do:

a <- order(animals$Period)    
lines(animals$Period[a], animals.lo$fitted[a], col="red", lwd=3)

follow-up on confidence interval

Try this:

plot(X15p5 ~ Period, animals)
animals.lo <- loess(X15p5 ~ Period, animals)
pred <- predict(animals.lo, se = TRUE)
a <- order(animals$Period)
lines(animals$Period[a], pred$fit[a], col="red", lwd=3)
lines(animals$Period[a], pred$fit[a] - qt(0.975, pred$df)*pred$se[a],lty=2)
lines(animals$Period[a], pred$fit[a] - qt(0.975, pred$df)*pred$se[a],lty=2)

You forgot about reordering again. You need to reorder both fitted values, as well as standard errors.

Now, the dist ~ speed model for cars data has no need for reordering. Because:

is.unsorted(cars$speed)  ## FALSE

Yes, data are already sorted there.

Note I have made two other changes to your code:

  1. I have separated loess call and predict call; Maybe you don't need to do this, but it is generally a good habit to separate model fitting and model prediction, and keeps a copy of both objects.
  2. I have changed loess(animals$X15p5 ~ animals$Period) to loess(X15p5 ~ Period, animals). It is a bad habit to use $ sign in specifying model formula. I have another answer at https://stackoverflow.com/a/37307270/4891738 showing the draw back of such style. You can read on the "update" section over there. I have used the glm as an example, but for lm, glm, loess, things are the same.


来源:https://stackoverflow.com/questions/37499924/problems-displaying-loess-regression-line-and-confidence-interval

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!