Predict using felm output with standard errors

前端未结

关注

 2  1674

Is there way to get predict behavior with standard errors from lfe::felm if the fixed effects are swept out using the projection method in felm? Th

相关标签:

2条回答

长情又很酷

2020-12-30 11:00

From your first model predict(.) yields this:

#        fit      lwr      upr
# 1 18436.18 2339.335 34533.03

Following 李哲源 we can achieve these results manually, too.

beta.hat.1 <- coef(model1)  # save coefficients
# model matrix: age=40, nodeg = 0, marr=1:
X.1 <- cbind(1, matrix(c(40, 0, 1), ncol=3))  
pred.1 <- as.numeric(X.1 %*% beta.hat.1) # prediction
V.1 <- vcov(model1)  # save var-cov matrix
se2.1 <- unname(rowSums((X.1 %*% V.1) * X.1))  # prediction var
alpha.1 <- qt((1-0.95)/2, df = model1$df.residual)  # 5 % level
pred.1 + c(alpha.1, -alpha.1) * sqrt(se2.1)  # 95%-CI
# [1] 18258.18 18614.18
sigma2.1 <- sum(model1$residuals ^ 2) / model1$df.residual  # sigma.sq
PI.1 <- pred.1 + c(alpha.1, -alpha.1) * sqrt(se2.1 + sigma2.1) # prediction interval
matrix(c(pred.1, PI.1), nrow = 1, dimnames = list(1, c("fit", "lwr", "upr")))
#        fit      lwr      upr
# 1 18436.18 2339.335 34533.03

Now, your linked example applied to multiple FE, we get this results:

lm.model <- lm(data=demeanlist(cps1[, c(8, 2)], 
                               list(as.factor(cps1$nodeg), 
                                    as.factor(cps1$marr))), re74 ~ age)
fe <- getfe(model2)
predict(lm.model, newdata = data.frame(age = 40)) + fe$effect[fe$idx=="1"]
# [1] 15091.75 10115.21

The first value is with and the second without added FE (try fe$effect[fe$idx=="1"]).

Now we're following the manual approach above.

beta.hat <- coef(model2)  # coefficient
x <- 40  # age = 40
pred <- as.numeric(x %*% beta.hat)  # prediction
V <- model2$vcv  # var/cov
se2 <- unname(rowSums((x %*% V) * x))  # prediction var
alpha <- qt((1-0.95)/2, df = model2$df.residual)  # 5% level
pred + c(alpha, -alpha) * sqrt(se2)  # CI
# [1]  9599.733 10630.697
sigma2 <- sum(model2$residuals ^ 2) / model2$df.residual  # sigma^2
PI <- pred + c(alpha, -alpha) * sqrt(se2 + sigma2)  # PI
matrix(c(pred, PI), nrow = 1, dimnames = list(1, c("fit", "lwr", "upr")))  # output
#        fit       lwr      upr
# 1 10115.21 -5988.898 26219.33

As we see, the fit is the same as the linked example approach, but now with prediction interval. (Disclaimer: The logic of the approach should be straightforward, the values of the PI should still be evaluated, e.g. in Stata with reghdfe.)

Edit: In case you want to achieve exactly the same output from felm() which predict.lm() yields with the linear model1, you simply need to "include" again the fixed effects in your model (see model3 below). Just follow the same approach then. For more convenience you easily could wrap it into a function.

library(DAAG)
library(lfe)

model3 <- felm(data = cps1, re74 ~ age + nodeg + marr)

pv <- c(40, 0, 1)  # prediction x-values

predict0.felm <- function(mod, pv.=pv) {
  beta.hat <- coef(mod)  # coefficient
  x <- cbind(1, matrix(pv., ncol=3))  # prediction vector
  pred <- as.numeric(x %*% beta.hat)  # prediction
  V <- mod[['vcv'] ] # var/cov
  se2 <- unname(rowSums((x %*% V) * x))  # prediction var
  alpha <- qt((1-0.95)/2, df = mod[['df.residual']])  # 5% level
  CI <- structure(pred + c(alpha, -alpha) * sqrt(se2), 
                  names=c("CI lwr", "CI upr"))  # CI
  sigma2 <- sum(mod[['residuals']] ^ 2) / mod[['df.residual']] # sigma^2
  PI <- pred + c(alpha, -alpha) * sqrt(se2 + sigma2)  # PI
  mx <- matrix(c(pred, PI), nrow = 1, 
               dimnames = list(1, c("PI fit", "PI lwr", "PI upr")))  # output
  list(CI, mx)
}

predict0.felm(model3)[[2]]
#     PI fit   PI lwr   PI upr
# 1 18436.18 2339.335 34533.03

By this with felm() you can achieve the same prediction interval as with predict.lm().

0 讨论(0)

清歌不尽

2020-12-30 11:19

After conversations with several people, I don't believe it is possible to obtain an estimate the distribution of yhat=Xb (where X includes both the covariates and the fixed effects) directly from felm, which is what this question boils down to. It is possible bootstrap them, however. The following code does so in parallel. There is scope for performance improvements, but this gives the general idea.

Note: here I do not compute full prediction interval, just the SEs on Xb, but obtaining the prediction interval is straightforward - just add the root of sigma^2 to the SE.

library(DAAG)
library(lfe)
library(parallel)

model1 <- lm(data = cps1, re74 ~ age + nodeg + marr)
yhat_lm <- predict(model1, newdata = data.frame(age=40, nodeg = 0, marr=1), se.fit = T)

set.seed(42)
boot_yhat <- function(b) {
  print(b)
  n <- nrow(cps1)
  boot <- cps1[sample(1:n, n, replace=T),]

  lm.model <- lm(data=demeanlist(boot[, c("re74", "age")], list(factor(boot$nodeg), factor(boot$marr))), 
                 formula = re74 ~ age)
  fe <- getfe(felm(data = boot, re74 ~ age | nodeg + marr))

  bootResult <- predict(lm.model, newdata = data.frame(age = 40)) + 
    fe$effect[fe$fe == "nodeg" & fe$idx==0] + 
    fe$effect[fe$fe == "marr" & fe$idx==1]
  return(bootResult)
}

B = 1000
yhats_boot <- mclapply(1:B, boot_yhat)

plot(density(rnorm(10000, mean=yhat_lm$fit, sd=yhat_lm$se.fit)))
lines(density(yhats), col="red")

0 讨论(0)