How to add sum to zero constraint to GLM in Python?

怎甘沉沦 提交于 2019-12-17 16:53:21

问题


I have a model set up in Python using the statsmodel glm function but now I want to add a sum to zero constraint to the model.

The model is defined as follows:

import statsmodels.formula.api as smf
model = smf.glm(formula="A ~ B + C + D", data=data, family=sm.families.Poisson()).fit()

In R, to add the constraint, I would simply do something like this:

model <- glm(A ~ B + C + D –1, family=poisson(), data=data, contrasts=list(C="contr.sum", D="contr.sum"))

That adds the sum to zero constraint to both C and D but I am not sure how to achieve the same in Python.

I have seen that there is a fit_constraint() method available but I am not too sure how to use it or if it is even the right thing to use to achieve what I require.

http://statsmodels.sourceforge.net/devel/generated/statsmodels.genmod.generalized_linear_model.GLM.fit_constrained.html#statsmodels.genmod.generalized_linear_model.GLM.fit_constrained

Can anyone offer any advice to applying this constraint?


回答1:


Here is an example to illustrate fit_constrained, using Gaussian family since I didn't quickly find a Poisson example with categorical variables

import pandas
import statsmodels.api as sm
from statsmodels.formula.api import glm

url = 'http://www.ats.ucla.edu/stat/data/hsb2.csv'
hsb2 = pandas.read_table(url, delimiter=",")

mod = glm("write ~ C(race) - 1", data=hsb2)
res = mod.fit()
print(res.summary())

constraint that all coefficients add to zero

res_c = mod.fit_constrained('C(race)[1] + C(race)[2] + C(race)[3] + C(race)[4] = 0')
print(res_c.summary())

                 Generalized Linear Model Regression Results                  
==============================================================================
Dep. Variable:                  write   No. Observations:                  200
Model:                            GLM   Df Residuals:                      197
Model Family:                Gaussian   Df Model:                            2
Link Function:               identity   Scale:                   1232.08314649
Method:                          IRLS   Log-Likelihood:                -993.41
Date:                Wed, 25 Mar 2015   Deviance:                   2.4149e+05
Time:                        16:42:37   Pearson chi2:                 2.41e+05
No. Iterations:                     1                                         
==============================================================================
                 coef    std err          z      P>|z|      [95.0% Conf. Int.]
------------------------------------------------------------------------------
C(race)[1]     1.0002    221.565      0.005      0.996      -433.260   435.260
C(race)[2]   -41.1814    267.253     -0.154      0.878      -564.988   482.626
C(race)[3]    -6.3498    235.771     -0.027      0.979      -468.453   455.754
C(race)[4]    46.5311    100.184      0.464      0.642      -149.827   242.889
==============================================================================

Model has been estimated subject to linear equality constraints.

constraints are comma separated and default to equal zero:

res_c2 = mod.fit_constrained('C(race)[1] + C(race)[2], C(race)[3] + C(race)[4]')
print(res_c2.summary())

the last prints

                 Generalized Linear Model Regression Results                  
==============================================================================
Dep. Variable:                  write   No. Observations:                  200
Model:                            GLM   Df Residuals:                      198
Model Family:                Gaussian   Df Model:                            1
Link Function:               identity   Scale:                   1438.99574167
Method:                          IRLS   Log-Likelihood:                -1008.9
Date:                Wed, 25 Mar 2015   Deviance:                   2.8204e+05
Time:                        16:42:37   Pearson chi2:                 2.82e+05
No. Iterations:                     1                                         
==============================================================================
                 coef    std err          z      P>|z|      [95.0% Conf. Int.]
------------------------------------------------------------------------------
C(race)[1]    13.6286    242.003      0.056      0.955      -460.689   487.946
C(race)[2]   -13.6286    242.003     -0.056      0.955      -487.946   460.689
C(race)[3]   -41.6606    111.458     -0.374      0.709      -260.115   176.794
C(race)[4]    41.6606    111.458      0.374      0.709      -176.794   260.115
==============================================================================

Model has been estimated subject to linear equality constraints.

I'm not sure how patsy formulas work so that none of the levels is dropped if there are several categorical explanatory variables.



来源:https://stackoverflow.com/questions/29261018/how-to-add-sum-to-zero-constraint-to-glm-in-python

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!