Can we set / fix the coefficients in a regression equation in Python

可紊 提交于 2019-12-02 15:28:35

问题


I've been trying to find a way of specifying pre-defined coefficients in a OLS/GLS regression in Python. I can do this in R using offset, but there doesn't seem to be anything similar in Python.

R equivalent:

model=lm(y~x+offset(0.2*z))

So in this example x and z are our independent variables, and x is predicted by the model but we have specified the impact of z is 0.2


回答1:


Using statsmodels you can perform regression analysis in Python with a style similar to R. There you will find offset as an argument in some of the regression functions. One example is GLM.

With a dataset such as this:

import statsmodels.formula.api as smf
import pandas as pd
import numpy as np

df = pd.DataFrame(np.random.randn(8, 3), columns=list('yxz'))
df

#           y         x         z
# 0  0.091761 -1.987569 -0.219672
# 1  0.357113  1.477894 -0.518270
# 2 -0.808494 -0.501757  0.915402
# 3  0.328751 -0.529760  0.513267
# 4  0.097078  0.968645 -0.702053
# 5 -0.327662 -0.392108 -1.463515
# 6  0.296120  0.261055  0.005113
# 7 -0.234587 -1.415371 -0.420645

You can do it like this:

known = 0.2

res1 = smf.glm('y ~ x', data = df, offset=known*df['z']).fit()
print(res1.summary())

# ==============================================================================
# Dep. Variable:                      y   No. Observations:                    8
# Model:                            GLM   Df Residuals:                        6
# .
# .
# ==============================================================================
#                  coef    std err          z      P>|z|      [0.025      0.975]
# ------------------------------------------------------------------------------
# Intercept      0.0614      0.165      0.373      0.709      -0.261       0.384
# x              0.1478      0.148      0.995      0.320      -0.143       0.439
# ==============================================================================

You could also run a sanity check by doing the same thing manually. You can create an offset like this:

offset = known*df['z']
y_offset = df['y']-offset
df2  = pd.concat([pd.Series(y_diff), df['x']], axis = 1)
df2.columns = ['y_diff', 'x']

res2 = smf.glm('y_offset ~ x', data = df2).fit()
print(res2.summary())

# ==============================================================================
# Dep. Variable:               y_offset   No. Observations:                    8
# Model:                            GLM   Df Residuals:                        6
# .
# .
# ==============================================================================
#                  coef    std err          z      P>|z|      [0.025      0.975]
# ------------------------------------------------------------------------------
# Intercept      0.0614      0.165      0.373      0.709      -0.261       0.384
# x              0.1478      0.148      0.995      0.320      -0.143       0.439
# ==============================================================================


来源:https://stackoverflow.com/questions/52015778/can-we-set-fix-the-coefficients-in-a-regression-equation-in-python

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!