Is there an existing function to estimate fixed effect (one-way or two-way) from Pandas or Statsmodels.
There used to be a function in Statsmodels but it seems discontinued. And in Pandas, there is something called plm
, but I can't import it or run it using pd.plm()
.
As noted in the comments, PanelOLS has been removed from Pandas as of version 0.20.0. So you really have three options:
If you use Python 3 you can use
linearmodels
as specified in the more recent answer: https://stackoverflow.com/a/44836199/3435183Just specify various dummies in your
statsmodels
specification, e.g. usingpd.get_dummies
. May not be feasible if the number of fixed effects is large.Or do some groupby based demeaning and then use
statsmodels
(this would work if you're estimating lots of fixed effects). Here is a barebones version of what you could do for one way fixed effects:def areg(formula,data=None,absorb=None,cluster=None): y,X = patsy.dmatrices(formula,data,return_type='dataframe') ybar = y.mean() y = y - y.groupby(data[absorb]).transform('mean') + ybar Xbar = X.mean() X = X - X.groupby(data[absorb]).transform('mean') + Xbar reg = sm.OLS(y,X) # Account for df loss from FE transform reg.df_resid -= (data[absorb].nunique() - 1) return reg.fit(cov_type='cluster',cov_kwds={'groups':data[cluster].values})
And here is what you can do if using an older version of Pandas
:
An example with time fixed effects using pandas' PanelOLS
(which is in the plm module). Notice, the import of PanelOLS
:
>>> from pandas.stats.plm import PanelOLS
>>> df
y x
date id
2012-01-01 1 0.1 0.2
2 0.3 0.5
3 0.4 0.8
4 0.0 0.2
2012-02-01 1 0.2 0.7
2 0.4 0.5
3 0.2 0.3
4 0.1 0.1
2012-03-01 1 0.6 0.9
2 0.7 0.5
3 0.9 0.6
4 0.4 0.5
Note, the dataframe must have a multindex set ; panelOLS
determines the time
and entity
effects based on the index:
>>> reg = PanelOLS(y=df['y'],x=df[['x']],time_effects=True)
>>> reg
-------------------------Summary of Regression Analysis-------------------------
Formula: Y ~ <x>
Number of Observations: 12
Number of Degrees of Freedom: 4
R-squared: 0.2729
Adj R-squared: 0.0002
Rmse: 0.1588
F-stat (1, 8): 1.0007, p-value: 0.3464
Degrees of Freedom: model 3, resid 8
-----------------------Summary of Estimated Coefficients------------------------
Variable Coef Std Err t-stat p-value CI 2.5% CI 97.5%
--------------------------------------------------------------------------------
x 0.3694 0.2132 1.73 0.1214 -0.0485 0.7872
---------------------------------End of Summary---------------------------------
Docstring:
PanelOLS(self, y, x, weights = None, intercept = True, nw_lags = None,
entity_effects = False, time_effects = False, x_effects = None,
cluster = None, dropped_dummies = None, verbose = False,
nw_overlap = False)
Implements panel OLS.
See ols function docs
This is another function (like fama_macbeth
) where I believe the plan is to move this functionality to statsmodels
.
There is a package called linearmodels
(https://pypi.org/project/linearmodels/) that has a fairly complete fixed effects and random effects implementation including clustered standard errors. It does not use high-dimensional OLS to eliminate effects and so can be used with large data sets.
# Outer is entity, inner is time
entity = list(map(chr,range(65,91)))
time = list(pd.date_range('1-1-2014',freq='A', periods=4))
index = pd.MultiIndex.from_product([entity, time])
df = pd.DataFrame(np.random.randn(26*4, 2),index=index, columns=['y','x'])
from linearmodels.panel import PanelOLS
mod = PanelOLS(df.y, df.x, entity_effects=True)
res = mod.fit(cov_type='clustered', cluster_entity=True)
print(res)
This produces the following output:
PanelOLS Estimation Summary
================================================================================
Dep. Variable: y R-squared: 0.0029
Estimator: PanelOLS R-squared (Between): -0.0109
No. Observations: 104 R-squared (Within): 0.0029
Date: Thu, Jun 29 2017 R-squared (Overall): -0.0007
Time: 23:52:28 Log-likelihood -125.69
Cov. Estimator: Clustered
F-statistic: 0.2256
Entities: 26 P-value 0.6362
Avg Obs: 4.0000 Distribution: F(1,77)
Min Obs: 4.0000
Max Obs: 4.0000 F-statistic (robust): 0.1784
P-value 0.6739
Time periods: 4 Distribution: F(1,77)
Avg Obs: 26.000
Min Obs: 26.000
Max Obs: 26.000
Parameter Estimates
==============================================================================
Parameter Std. Err. T-stat P-value Lower CI Upper CI
------------------------------------------------------------------------------
x 0.0573 0.1356 0.4224 0.6739 -0.2127 0.3273
==============================================================================
F-test for Poolability: 1.0903
P-value: 0.3739
Distribution: F(25,77)
Included effects: Entity
It also has a formula interface which is similar to statsmodels,
mod = PanelOLS.from_formula('y ~ x + EntityEffects', df)
来源:https://stackoverflow.com/questions/24195432/fixed-effect-in-pandas-or-statsmodels