问题
Say I have a very simple model
library(foreign)
smoke <- read.dta("http://fmwww.bc.edu/ec-p/data/wooldridge/smoke.dta")
smoking.reg <- lm(cigs ~ educ, data=smoke)
AIC(smoking.reg)
BIC(smoking.reg)
In R I get the following results:
> AIC(smoking.reg)
[1] 6520.26
> BIC(smoking.reg)
[1] 6534.34
Running the same regression however in Stata
use http://fmwww.bc.edu/ec-p/data/wooldridge/smoke.dta
reg cigs educ
returns the following result
estat ic
How can I get R to return exactly the same values as does Stata for AIC and BIC?
回答1:
AIC is calculated as -2*log likelihood + 2* number of parameters
BIC is calculated as -2*log likelihood + log(n)* number of parameters, where n is the sample size.
Your linear regression has three parameters - two coefficients and the variance -- and so you can calculate AIC and BIC as
ll = logLik(smoking.reg)
aic = -2*ll + 2* 3 # 6520.26
bic = -2*ll + log(nrow(smoke))* 3 # 6534.34
(As Ben Bolker mentioned in the comments the logLik object has several attributes which you can use to get the number of parameters ("df") and the number of observations ("nobs"). See attr(ll, "df") and attr(ll, "nobs") )
Stata does not include the variance parameter, only including the number of coefficients. This usually would not be a problem as information criteria are usually used to compare models (AIC_of_model1 - AIC_of_model2) and so if this parameter is omitted in both calculations it will make no difference. In Stata the calculation is
aic = -2*ll + 2* 2 # 6518.26
bic = -2*ll + log(nrow(smoke))* 2 # 6527.647
来源:https://stackoverflow.com/questions/62307197/how-to-get-the-same-values-for-aic-and-bic-in-r-as-in-stata