When using stargazer to create a LaTeX table on a logistic regression object the standard behaviour is to output logit-values of each model. Is it possible to get exp(logit)
There are pieces of the right answer across the various posts, but none of them seem to put it all together. Assuming the following:
glm_out <- glm(Y ~ X, data=DT, family = "binomial")
For a logistic regression, the regression coefficient (b1) is the estimated increase in the log odds of Y per unit increase in X. So, to get the odds-ratio, we just use the exp function:
OR <- exp(coef(glm_out))
# pass in coef directly
stargazer(glm_out, coef = list(OR), t.auto=F, p.auto=F)
# or, use the apply.coef option
stargazer(glm_out, apply.coef = exp, t.auto=F, p.auto=F)
You cannot simply use apply.se = exp
to get the Std. Error for the Odds Ratio
Instead, you have to use the function: Std.Error.OR = OR * SE(coef)
# define a helper function to extract SE from glm output
se.coef <- function(glm.output){sqrt(diag(vcov(glm.output)))}
# or, you can use the arm package
se.coef <- arm::se.coef
#Get the odds ratio
OR <- exp(coef(glm_out))
# Then, we can get the `StdErr.OR` by multiplying the two:
Std.Error.OR <- OR * se.coef(glm_out)
So, to get it into stargazer, we use the following:
# using Std Errors
stargazer(glm_out, coef=list(OR), se = list(Std.Error.OR), t.auto=F, p.auto=F)
Confidence intervals in an odds-ratio setting are not symmetric. So, we cannot just do ±1.96*SE(OR) to get the CI. Instead, we can compute it from the original log odds exp(coef ± 1.96*SE)
.
# Based on normal distribution to compute Wald CIs:
# we use confint.default to obtain the conventional confidence intervals
# then, use the exp function to get the confidence intervals
CI.OR <- as.matrix(exp(confint.default(glm_out)))
So, to get it into stargazer, we use the following:
# using ci.custom
stargazer(glm_out, coef=list(OR), ci.custom = list(CI.OR), t.auto=F, p.auto=F, ci = T)
# using apply.ci
stargazer(glm_out, apply.coef = exp, apply.ci = exp, t.auto=F, p.auto=F, ci = T)
Do not use the Confidence Intervals of Odds Ratios to compute significance (see note and reference at the bottom). Instead, you can do it using the log odds:
z <- coef(glm_out)/se.coef(glm_out)
And, use that to get the p.values for significance tests:
pvalue <- 2*pnorm(abs(coef(glm_out)/se.coef(glm_out)), lower.tail = F)
(source: https://data.princeton.edu/wws509/r/c3s1)
See this link for more detailed discussion on statistical testing: https://stats.stackexchange.com/questions/144603/why-do-my-p-values-differ-between-logistic-regression-output-chi-squared-test
It is important to note however, that unlike the p value, the 95% CI does not report a measure’s statistical significance. In practice, the 95% CI is often used as a proxy for the presence of statistical significance if it does not overlap the null value (e.g. OR=1). Nevertheless, it would be inappropriate to interpret an OR with 95% CI that spans the null value as indicating evidence for lack of association between the exposure and outcome. source: Explaining Odds Ratios