问题
consider this simple example
dataframe <- data_frame(x = c(1,2,3,4,5,6),
y = c(12,24,24,34,12,15))
> dataframe
# A tibble: 6 x 2
x y
<dbl> <dbl>
1 1 12
2 2 24
3 3 24
4 4 34
5 5 12
6 6 15
dataframe %>% ggplot(., aes(x = x, y = y)) +
geom_point() +
geom_smooth(method = 'lm', formula = y~x)
Here the standard errors are computed with the default option. However, I would like to use the robust variance-covariance matrix available in the package sandwich
and lmtest
That is, using vcovHC(mymodel, "HC3")
Is there a way to get that in a simple way using the geom_smooth()
function?
回答1:
HC robust SEs (simple)
This is easily done now thanks to the estimatr package and its family of lm_robust
functions. E.g.
library(tidyverse)
library(estimatr)
dataframe %>%
ggplot(aes(x = x, y = y)) +
geom_point() +
geom_smooth(method = 'lm_robust', formula = y~x, fill="#E41A1C") + ## Robust (HC) SEs
geom_smooth(method = 'lm', formula = y~x) + ## Just for comparison
theme_minimal()
HAC robust SES (a bit more legwork)
The one caveat is that estimatr
does not yet offer support for HAC (i.e. heteroscedasticity and autocorrelation consistent) SEs a la Newey-West. However, it is possible to obtain these manually with the sandwich
package (which is kind of what the original question was asking anyway) and then plot using geom_ribbon()
.
I'll say for the record that HAC SEs don't make much sense for this particular data set, but here's an example of how you could do it, riffing off this excellent SO answer on a related topic.
reg1 <- lm(y~x, data = dataframe)
## Generate a prediction DF
pred_df <-
data.frame(predict(reg1, se.fit = T, interval="confidence")) %>%
as_tibble()
## Clean up a little bit (optional)
colnames(pred_df) <- gsub("fit.", "", colnames(pred_df))
## Get the design matrix
X_mat <- model.matrix(reg1)
## Get HAC VCOV matrix and calculate SEs
library(sandwich)
v_hac <- NeweyWest(reg1, prewhite = F, adjust = T) ## HAC VCOV (adjusted for small data sample)
var_fit_hac <- rowSums((X_mat %*% v_hac) * X_mat) ## Point-wise variance for predicted mean
## Add these to pred_df
pred_df <-
pred_df %>%
mutate(se_fit_hac = sqrt(var_fit_hac)) %>%
mutate(
lwr_hac = fit - qt(0.975, df=df)*se_fit_hac,
upr_hac = fit + qt(0.975, df=df)*se_fit_hac
)
bind_cols(
dataframe,
pred_df
) %>%
ggplot(aes(x = x, y = y, ymin=lwr_hac, ymax=upr_hac)) +
geom_point() +
geom_ribbon(fill="#E41A1C", alpha=0.3, col=NA) + ## Robust (HAC) SEs
geom_smooth(method = 'lm', formula = y~x) + ## Just for comparison
theme_minimal()
Note that you could also use this approach to manually calculate and plot other robust SE predictions (e.g. HC1, HC2,etc.) if you so wished. All you would need to do is use the relevant sandwich estimator. For instance, using vcovHC(reg1, type = "HC2")
instead of NeweyWest(reg1, prewhite = F, adjust = T)
will give you an identical HC-robust CI to the first example that uses the estimatr
package.
回答2:
I am very new to this whole robust SE thing, but I was able to generate the following:
zz = '
x y
1 1 12
2 2 24
3 3 24
4 4 34
5 5 12
6 6 15
'
df <- read.table(text = zz, header = TRUE)
df
library(sandwich)
library(lmtest)
lm.model<-lm(y ~ x, data = df)
coef(lm.model)
se = sqrt(diag(vcovHC(lm.model, type = "HC3")))
fit = predict(lm.model)
predframe <- with(df,data.frame(x,
y = fit,
lwr = fit - 1.96 * se,
upr = fit + 1.96 * se))
library(ggplot2)
ggplot(df, aes(x = x, y = y))+
geom_point()+
geom_line(data = predframe)+
geom_ribbon(data = predframe, aes(ymin = lwr,ymax = upr), alpha = 0.3)
来源:https://stackoverflow.com/questions/45313482/ggplot2-how-to-get-robust-confidence-interval-for-predictions-in-geom-smooth