How to add sum to zero constraint to GLM in Python?

拟墨画扇 提交于 2019-11-28 01:40:23

Here is an example to illustrate fit_constrained, using Gaussian family since I didn't quickly find a Poisson example with categorical variables

import pandas
import statsmodels.api as sm
from statsmodels.formula.api import glm

url = 'http://www.ats.ucla.edu/stat/data/hsb2.csv'
hsb2 = pandas.read_table(url, delimiter=",")

mod = glm("write ~ C(race) - 1", data=hsb2)
res = mod.fit()
print(res.summary())

constraint that all coefficients add to zero

res_c = mod.fit_constrained('C(race)[1] + C(race)[2] + C(race)[3] + C(race)[4] = 0')
print(res_c.summary())

                 Generalized Linear Model Regression Results                  
==============================================================================
Dep. Variable:                  write   No. Observations:                  200
Model:                            GLM   Df Residuals:                      197
Model Family:                Gaussian   Df Model:                            2
Link Function:               identity   Scale:                   1232.08314649
Method:                          IRLS   Log-Likelihood:                -993.41
Date:                Wed, 25 Mar 2015   Deviance:                   2.4149e+05
Time:                        16:42:37   Pearson chi2:                 2.41e+05
No. Iterations:                     1                                         
==============================================================================
                 coef    std err          z      P>|z|      [95.0% Conf. Int.]
------------------------------------------------------------------------------
C(race)[1]     1.0002    221.565      0.005      0.996      -433.260   435.260
C(race)[2]   -41.1814    267.253     -0.154      0.878      -564.988   482.626
C(race)[3]    -6.3498    235.771     -0.027      0.979      -468.453   455.754
C(race)[4]    46.5311    100.184      0.464      0.642      -149.827   242.889
==============================================================================

Model has been estimated subject to linear equality constraints.

constraints are comma separated and default to equal zero:

res_c2 = mod.fit_constrained('C(race)[1] + C(race)[2], C(race)[3] + C(race)[4]')
print(res_c2.summary())

the last prints

                 Generalized Linear Model Regression Results                  
==============================================================================
Dep. Variable:                  write   No. Observations:                  200
Model:                            GLM   Df Residuals:                      198
Model Family:                Gaussian   Df Model:                            1
Link Function:               identity   Scale:                   1438.99574167
Method:                          IRLS   Log-Likelihood:                -1008.9
Date:                Wed, 25 Mar 2015   Deviance:                   2.8204e+05
Time:                        16:42:37   Pearson chi2:                 2.82e+05
No. Iterations:                     1                                         
==============================================================================
                 coef    std err          z      P>|z|      [95.0% Conf. Int.]
------------------------------------------------------------------------------
C(race)[1]    13.6286    242.003      0.056      0.955      -460.689   487.946
C(race)[2]   -13.6286    242.003     -0.056      0.955      -487.946   460.689
C(race)[3]   -41.6606    111.458     -0.374      0.709      -260.115   176.794
C(race)[4]    41.6606    111.458      0.374      0.709      -176.794   260.115
==============================================================================

Model has been estimated subject to linear equality constraints.

I'm not sure how patsy formulas work so that none of the levels is dropped if there are several categorical explanatory variables.

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!