Getting adjusted r-squared value for each line in a geom_smooth gam

余生长醉 提交于 2019-12-04 18:24:28

This is not really possible, because ggplot2 throws away the fitted object. You can see this in the source here.

1. Solving the problem by patching ggplot2

One ugly workaround is to patch the ggplot2 code on the fly to print out the results. You can do this as follows. The initial assignment throws an error but things work anyways. To undo this just restart your R session.

library(ggplot2)

# assignInNamespace patches `predictdf.glm` from ggplot2 and adds 
# a line that prints the summary of the model. For some reason, this
# creates an error, but things work nonetheless.
assignInNamespace("predictdf.glm", function(model, xseq, se, level) {
  pred <- stats::predict(model, newdata = data.frame(x = xseq), se.fit = se,
                         type = "link")

  print(summary(model)) # this is the line I added

  if (se) {
    std <- stats::qnorm(level / 2 + 0.5)
    data.frame(
      x = xseq,
      y = model$family$linkinv(as.vector(pred$fit)),
      ymin = model$family$linkinv(as.vector(pred$fit - std * pred$se.fit)),
      ymax = model$family$linkinv(as.vector(pred$fit + std * pred$se.fit)),
      se = as.vector(pred$se.fit)
    )
  } else {
    data.frame(x = xseq, y = model$family$linkinv(as.vector(pred)))
  }
}, "ggplot2")

Now we can make a plot with the patched ggplot2:

ggplot(iris, aes(Sepal.Length, Sepal.Width, color = Species)) +
  geom_point() + geom_smooth(se = F, method = "gam", formula = y ~ s(x, bs = "cs"))

Console output:

Family: gaussian 
Link function: identity 

Formula:
y ~ s(x, bs = "cs")

Parametric coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)   3.4280     0.0365   93.91   <2e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Approximate significance of smooth terms:
       edf Ref.df     F  p-value    
s(x) 1.546      9 5.947 5.64e-11 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

R-sq.(adj) =  0.536   Deviance explained = 55.1%
GCV = 0.070196  Scale est. = 0.066622  n = 50

Family: gaussian 
Link function: identity 

Formula:
y ~ s(x, bs = "cs")

Parametric coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)  2.77000    0.03797   72.96   <2e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Approximate significance of smooth terms:
       edf Ref.df     F  p-value    
s(x) 1.564      9 1.961 8.42e-05 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

R-sq.(adj) =  0.268   Deviance explained = 29.1%
GCV = 0.075969  Scale est. = 0.072074  n = 50

Family: gaussian 
Link function: identity 

Formula:
y ~ s(x, bs = "cs")

Parametric coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)  2.97400    0.04102    72.5   <2e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Approximate significance of smooth terms:
       edf Ref.df     F p-value   
s(x) 1.279      9 1.229   0.001 **
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

R-sq.(adj) =  0.191   Deviance explained = 21.2%
GCV = 0.088147  Scale est. = 0.08413   n = 50

Note: I do not recommend this approach.

2. Solving the problem by fitting models via tidyverse

I think it's better to just run your models separately. Doing so is quite easy with tidyverse and broom, so I'm not sure why you wouldn't want to do it.

library(tidyverse)
library(broom)
iris %>% nest(-Species) %>% 
  mutate(fit = map(data, ~mgcv::gam(Sepal.Width ~ s(Sepal.Length, bs = "cs"), data = .)),
         results = map(fit, glance),
         R.square = map_dbl(fit, ~ summary(.)$r.sq)) %>%
  unnest(results) %>%
  select(-data, -fit)

#      Species  R.square       df    logLik      AIC      BIC deviance df.residual
# 1     setosa 0.5363514 2.546009 -1.922197 10.93641 17.71646 3.161460    47.45399
# 2 versicolor 0.2680611 2.563623 -3.879391 14.88603 21.69976 3.418909    47.43638
# 3  virginica 0.1910916 2.278569 -7.895997 22.34913 28.61783 4.014793    47.72143

As you can see, the extracted R squared values are exactly the same in both cases.

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!