Predict using felm output with standard errors

前端 未结 2 1668
春和景丽
春和景丽 2020-12-30 10:35

Is there way to get predict behavior with standard errors from lfe::felm if the fixed effects are swept out using the projection method in felm? Th

相关标签:
2条回答
  • 2020-12-30 11:00

    From your first model predict(.) yields this:

    #        fit      lwr      upr
    # 1 18436.18 2339.335 34533.03
    

    Following 李哲源 we can achieve these results manually, too.

    beta.hat.1 <- coef(model1)  # save coefficients
    # model matrix: age=40, nodeg = 0, marr=1:
    X.1 <- cbind(1, matrix(c(40, 0, 1), ncol=3))  
    pred.1 <- as.numeric(X.1 %*% beta.hat.1) # prediction
    V.1 <- vcov(model1)  # save var-cov matrix
    se2.1 <- unname(rowSums((X.1 %*% V.1) * X.1))  # prediction var
    alpha.1 <- qt((1-0.95)/2, df = model1$df.residual)  # 5 % level
    pred.1 + c(alpha.1, -alpha.1) * sqrt(se2.1)  # 95%-CI
    # [1] 18258.18 18614.18
    sigma2.1 <- sum(model1$residuals ^ 2) / model1$df.residual  # sigma.sq
    PI.1 <- pred.1 + c(alpha.1, -alpha.1) * sqrt(se2.1 + sigma2.1) # prediction interval
    matrix(c(pred.1, PI.1), nrow = 1, dimnames = list(1, c("fit", "lwr", "upr")))
    #        fit      lwr      upr
    # 1 18436.18 2339.335 34533.03
    

    Now, your linked example applied to multiple FE, we get this results:

    lm.model <- lm(data=demeanlist(cps1[, c(8, 2)], 
                                   list(as.factor(cps1$nodeg), 
                                        as.factor(cps1$marr))), re74 ~ age)
    fe <- getfe(model2)
    predict(lm.model, newdata = data.frame(age = 40)) + fe$effect[fe$idx=="1"]
    # [1] 15091.75 10115.21
    

    The first value is with and the second without added FE (try fe$effect[fe$idx=="1"]).

    Now we're following the manual approach above.

    beta.hat <- coef(model2)  # coefficient
    x <- 40  # age = 40
    pred <- as.numeric(x %*% beta.hat)  # prediction
    V <- model2$vcv  # var/cov
    se2 <- unname(rowSums((x %*% V) * x))  # prediction var
    alpha <- qt((1-0.95)/2, df = model2$df.residual)  # 5% level
    pred + c(alpha, -alpha) * sqrt(se2)  # CI
    # [1]  9599.733 10630.697
    sigma2 <- sum(model2$residuals ^ 2) / model2$df.residual  # sigma^2
    PI <- pred + c(alpha, -alpha) * sqrt(se2 + sigma2)  # PI
    matrix(c(pred, PI), nrow = 1, dimnames = list(1, c("fit", "lwr", "upr")))  # output
    #        fit       lwr      upr
    # 1 10115.21 -5988.898 26219.33
    

    As we see, the fit is the same as the linked example approach, but now with prediction interval. (Disclaimer: The logic of the approach should be straightforward, the values of the PI should still be evaluated, e.g. in Stata with reghdfe.)

    Edit: In case you want to achieve exactly the same output from felm() which predict.lm() yields with the linear model1, you simply need to "include" again the fixed effects in your model (see model3 below). Just follow the same approach then. For more convenience you easily could wrap it into a function.

    library(DAAG)
    library(lfe)
    
    model3 <- felm(data = cps1, re74 ~ age + nodeg + marr)
    
    pv <- c(40, 0, 1)  # prediction x-values
    
    predict0.felm <- function(mod, pv.=pv) {
      beta.hat <- coef(mod)  # coefficient
      x <- cbind(1, matrix(pv., ncol=3))  # prediction vector
      pred <- as.numeric(x %*% beta.hat)  # prediction
      V <- mod[['vcv'] ] # var/cov
      se2 <- unname(rowSums((x %*% V) * x))  # prediction var
      alpha <- qt((1-0.95)/2, df = mod[['df.residual']])  # 5% level
      CI <- structure(pred + c(alpha, -alpha) * sqrt(se2), 
                      names=c("CI lwr", "CI upr"))  # CI
      sigma2 <- sum(mod[['residuals']] ^ 2) / mod[['df.residual']] # sigma^2
      PI <- pred + c(alpha, -alpha) * sqrt(se2 + sigma2)  # PI
      mx <- matrix(c(pred, PI), nrow = 1, 
                   dimnames = list(1, c("PI fit", "PI lwr", "PI upr")))  # output
      list(CI, mx)
    }
    
    predict0.felm(model3)[[2]]
    #     PI fit   PI lwr   PI upr
    # 1 18436.18 2339.335 34533.03
    

    By this with felm() you can achieve the same prediction interval as with predict.lm().

    0 讨论(0)
  • 2020-12-30 11:19

    After conversations with several people, I don't believe it is possible to obtain an estimate the distribution of yhat=Xb (where X includes both the covariates and the fixed effects) directly from felm, which is what this question boils down to. It is possible bootstrap them, however. The following code does so in parallel. There is scope for performance improvements, but this gives the general idea.

    Note: here I do not compute full prediction interval, just the SEs on Xb, but obtaining the prediction interval is straightforward - just add the root of sigma^2 to the SE.

    library(DAAG)
    library(lfe)
    library(parallel)
    
    model1 <- lm(data = cps1, re74 ~ age + nodeg + marr)
    yhat_lm <- predict(model1, newdata = data.frame(age=40, nodeg = 0, marr=1), se.fit = T)
    
    set.seed(42)
    boot_yhat <- function(b) {
      print(b)
      n <- nrow(cps1)
      boot <- cps1[sample(1:n, n, replace=T),]
    
      lm.model <- lm(data=demeanlist(boot[, c("re74", "age")], list(factor(boot$nodeg), factor(boot$marr))), 
                     formula = re74 ~ age)
      fe <- getfe(felm(data = boot, re74 ~ age | nodeg + marr))
    
      bootResult <- predict(lm.model, newdata = data.frame(age = 40)) + 
        fe$effect[fe$fe == "nodeg" & fe$idx==0] + 
        fe$effect[fe$fe == "marr" & fe$idx==1]
      return(bootResult)
    }
    
    B = 1000
    yhats_boot <- mclapply(1:B, boot_yhat)
    
    plot(density(rnorm(10000, mean=yhat_lm$fit, sd=yhat_lm$se.fit)))
    lines(density(yhats), col="red")
    
    0 讨论(0)
提交回复
热议问题