Restrict fitted regression line (abline) to range of data used in model

拥有回忆 提交于 2021-02-19 23:39:06

问题


Is it possible to draw an abline of a fit only in a certain range of x-values?

I have a dataset with a linear fit of a subset of that dataset:

# The dataset:
daten <- data.frame(x = c(0:6), y = c(0.3, 0.1, 0.9, 3.1, 5, 4.9, 6.2))

# make a linear fit for the datapoints 3, 4, 5
daten_fit <- lm(formula = y~x, data = daten, subset = 3:5)

When I plot the data and draw a regression line:

plot (y ~ x, data = daten)
abline(reg = daten_fit)

The line is drawn for the full range of x-values in the original data. But, I want to draw the regression line only for the subset of data that was used for curve fitting. There were 2 ideas that came to my mind:

  1. Draw a second line that is thicker, but is only shown in the range 3:5. I checked the parameters for abline, lines and segments but I could not find anything

  2. Add small ticks to the respective positions, that are perpendicular to the abline. I have now idea how I could do this. this would be the nicer way of course.

Do you have any idea for a solution?


回答1:


One way would be to use colours to distinguish between points that are fitted and those that aren't:

daten_fit <- lm(formula = y~x, data = daten[3:5, ])

plot(y ~ x, data = daten)
points(y ~ x, data = daten[3:5, ], col="red")
abline(reg=daten_fit, col="red")

enter image description here

The second way is to plot the tick marks on the x-axis. These ticks are called rugs, and can be drawn using the rug function. But first you have to calculate the range:

#points(y ~ x, data = daten[3:5, ], col="red")
abline(reg=daten_fit, col="red")
rug(range(daten[3:5, 1]), lwd=3, col="red")

enter image description here




回答2:


The answer is No, it is not possible to get abline() to draw the fitted line on only one part of the plot region where the model was fitted. This is because it uses only the model coefficients to draw the line, not predictions from the model. If you look closely, you'll see that the line draw actually extends outside the plot region, covering the plot frame where it exists the region.

The simplest solution to such problems is to predict from the model for the regions you want.

# The dataset:
daten <- data.frame(x = c(0:6), y = c(0.3, 0.1, 0.9, 3.1, 5, 4.9, 6.2))
# make a linear fit for the datapoints 3, 4, 5
mod <- lm(y~x, data = daten, subset = 3:5)

First, we get the range of x values we want to differentiate:

xr <- with(daten, range(x[3:5]))

then we predict for a set of evenly spaced points on this range using the model:

pred <- data.frame(x = seq(from = xr[1], to = xr[2], length = 50))
pred <- transform(pred, yhat = predict(mod, newdata = pred))

Now plot the data and the model using abline():

plot(y ~ x, data = daten)
abline(mod)

then add in the region you want to emphasise:

lines(yhat ~ x, data = pred, col = "red", lwd = 2)

Which gives us this plot:

enter image description here

If you have a model that is more complex than that which can be handled by abline(), then we take a slightly different strategy, predicting over the range of the available, plotted data to draw the line, and then pick out the interval we want to highlight. The following code does that:

## range of all `x` data
xr2 <- with(daten, range(x))
## same as before
pred <- data.frame(x = seq(from = xr2[1], to = xr2[2], length = 100))
pred <- transform(pred, yhat = predict(mod, newdata = pred))

## plot the data and the fitted model line
plot(y ~ x, data = daten)
lines(yhat ~ x, data = pred)

## add emphasis to the interval used in fitting
with(pred, lines(yhat ~ x, data = pred, subset = x >= xr[1] & x <= xr[2],
                 lwd = 2, col = "red"))

What we do here is use the subset argument to pick out the values from the predictions that are in the interval used in fitting, the vector we pass to subset is a logical vector of TRUE and FALSE values indicating which data are in the region of interest and lines() only plots a line along those data.

R> head(with(pred, x >= xr[1] & x <= xr[2]))
[1] FALSE FALSE FALSE FALSE FALSE FALSE

One might wonder why I have done predictions over 50 or 100 evenly spaced values of the predictor variable when we could, in this case, just have done a prediction for the start and the end of the data or region of interest and join the two points? Well, not all modelling exercises are that simple - you double log model from a previous question is a case in point - and the generic solution I outline above will work in all cases whereas simply joining two predictions won't.

@Andrie has furnished you with a solution to Idea 2.




回答3:


This is a somewhat basic plotting question -- use the ylim=c(low, high) option with suitable options for low and high.

You may want to read then An Introduction to R manual that came with your R version, and the other fine contributed documentation on the CRAN site.



来源:https://stackoverflow.com/questions/6279759/restrict-fitted-regression-line-abline-to-range-of-data-used-in-model

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!