Adding a regression line on a ggplot

跟風遠走 提交于 2019-11-26 12:07:50

问题


I\'m trying hard to add a regression line on a ggplot. I first tried with abline but I didn\'t manage to make it work. Then I tried this...

data = data.frame(x.plot=rep(seq(1,5),10),y.plot=rnorm(50))
ggplot(data,aes(x.plot,y.plot))+stat_summary(fun.data=mean_cl_normal) +
   geom_smooth(method=\'lm\',formula=data$y.plot~data$x.plot)

But it is not working either.


回答1:


In general, to provide your own formula you should use arguments x and y that will correspond to values you provided in ggplot() - in this case x will be interpreted as x.plot and y as y.plot. More information about smoothing methods and formula you can find in help page of function stat_smooth() as it is default stat used by geom_smooth().

ggplot(data,aes(x.plot, y.plot)) +
  stat_summary(fun.data=mean_cl_normal) + 
  geom_smooth(method='lm', formula= y~x)

If you are using the same x and y values that you supplied in the ggplot() call and need to plot linear regression line then you don't need to use the formula inside geom_smooth(), just supply the method="lm".

ggplot(data,aes(x.plot, y.plot)) +
  stat_summary(fun.data= mean_cl_normal) + 
  geom_smooth(method='lm')



回答2:


As I just figured, in case you have a model fitted on multiple linear regression, the above mentioned solution won't work.

You have to create your line manually as a dataframe that contains predicted values for your original dataframe (in your case data).

It would look like this:

# read dataset
df = mtcars

# create multiple linear model
lm_fit <- lm(mpg ~ cyl + hp, data=df)
summary(lm_fit)

# save predictions of the model in the new data frame 
# together with variable you want to plot against
predicted_df <- data.frame(mpg_pred = predict(lm_fit, df), hp=df$hp)

# this is the predicted line of multiple linear regression
ggplot(data = df, aes(x = mpg, y = hp)) + 
  geom_point(color='blue') +
  geom_line(color='red',data = predicted_df, aes(x=mpg_pred, y=hp))

# this is predicted line comparing only chosen variables
ggplot(data = df, aes(x = mpg, y = hp)) + 
  geom_point(color='blue') +
  geom_smooth(method = "lm", se = FALSE)




回答3:


The obvious solution using geom_abline:

geom_abline(slope = data.lm$coefficients[2], intercept = data.lm$coefficients[1])

Where data.lm is an lm object, and data.lm$coefficients looks something like this:

data.lm$coefficients
(Intercept)    DepDelay 
  -2.006045    1.025109 

Identical in practice is using stat_function to plot the regression line as a function of x, making use of predict:

stat_function(fun = function(x) predict(data.lm, newdata = data.frame(DepDelay=x)))

This is a little less efficient since by default n=101 points are computed, but much more flexible since it will plot a prediction curve for any model that supports predict, such as non-linear npreg from package np.

Note: If you use scale_x_continuous or scale_y_continuous some values may be cutoff and thus geom_smooth may not work correctly. Use coord_cartesian to zoom instead.




回答4:


If you want to fit other type of models, like a dose-response curve using logistic models you would also need to create more data points with the function predict if you want to have a smoother regression line:

fit: your fit of a logistic regression curve

#Create a range of doses:
mm <- data.frame(DOSE = seq(0, max(data$DOSE), length.out = 100))
#Create a new data frame for ggplot using predict and your range of new 
#doses:
fit.ggplot=data.frame(y=predict(fit, newdata=mm),x=mm$DOSE)

ggplot(data=data,aes(x=log10(DOSE),y=log(viability)))+geom_point()+
geom_line(data=fit.ggplot,aes(x=log10(x),y=log(y)))



回答5:


I found this function on a blog

 ggplotRegression <- function (fit) {

    `require(ggplot2)

    ggplot(fit$model, aes_string(x = names(fit$model)[2], y = names(fit$model)[1])) + 
      geom_point() +
      stat_smooth(method = "lm", col = "red") +
      labs(title = paste("Adj R2 = ",signif(summary(fit)$adj.r.squared, 5),
                         "Intercept =",signif(fit$coef[[1]],5 ),
                         " Slope =",signif(fit$coef[[2]], 5),
                         " P =",signif(summary(fit)$coef[2,4], 5)))
    }`

once you loaded the function you could simply

ggplotRegression(fit)

you can also go for ggplotregression( y ~ x + z + Q, data)

Hope this helps.



来源:https://stackoverflow.com/questions/15633714/adding-a-regression-line-on-a-ggplot

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!