Statsmodels: Calculate fitted values and R squared

后端 未结 1 1025
太阳男子
太阳男子 2020-12-15 11:54

I am running a regression as follows (df is a pandas dataframe):

import statsmodels.api as sm
est = sm.OLS(df[\'p\'], df[[\'e\', \'         


        
相关标签:
1条回答
  • 2020-12-15 12:12

    If you do not include an intercept (constant explanatory variable) in your model, statsmodels computes R-squared based on un-centred total sum of squares, ie.

    tss = (ys ** 2).sum()  # un-centred total sum of squares
    

    as opposed to

    tss = ((ys - ys.mean())**2).sum()  # centred total sum of squares
    

    as a result, R-squared would be much higher.

    This is mathematically correct. Because, R-squared should indicate how much of the variation is explained by the full-model comparing to the reduced model. If you define your model as:

    ys = beta1 . xs + beta0 + noise
    

    then the reduced model can be: ys = beta0 + noise, where the estimate for beta0 is the sample average, thus we have: noise = ys - ys.mean(). That is where de-meaning comes from in a model with intercept.

    But from a model like:

    ys = beta . xs + noise
    

    you may only reduce to: ys = noise. Since noise is assumed zero-mean, you may not de-mean ys. Therefore, unexplained variation in the reduced model is the un-centred total sum of squares.

    This is documented here under rsquared item. Set yBar equal to zero, and I would expect you will get the same number.

    0 讨论(0)
提交回复
热议问题