Pandas: Trouble implementing Panel OLS

大憨熊 提交于 2019-12-11 19:15:58

问题


I'm having a little bit of a difficult time understanding how to implement the Panel OLS in pandas. I have received help on this topic and I thought I was understanding the situation. Now that I am trying to implement I am having difficulty. Below is my data:

url='https://raw.githubusercontent.com/108michael/ms_thesis/master/crsp.dime.mpl.df.1'



   df=pd.read_csv(url, usecols=(['date', 'cid', 'log_diff_rgdp', 'billsum_support', \
'years_exp', 'leg_totalbills', 'log_diff_rgdp', 'unemployment',  'expendituresfor',\
    'direct_expenditures', 'indirect_expenditures', 'Republican', 'sen'])))
    df.head(1)  

    cid     date    log_diff_rgdp   unemployment    leg_totalbills  years_exp   Republican  sen     billsum_support     expendituresfor     direct_expenditures     indirect_expenditures
0   N00013870   2007    0.026069    4.6     44  5   1.0     1.0     1.0     4.0     4.0     0.0


df=df.T.to_panel()

df=df.transpose(2,0,1)

df

<class 'pandas.core.panel.Panel'>
Dimensions: 505 (items) x 10 (major_axis) x 72 (minor_axis)
Items axis: N00000010 to N00035686
Major_axis axis: 2005 to 2014
Minor_axis axis: index to indirect_expenditures

It is my understanding (I think I could be wrong about this) that the Items axis contains all of the panels; that the Minor_axis contains all of the columns in each of the panels; and that the Major_axis is the time index. I have posted the first row of my data before sending it to Paneland billsum_support is the 4th from the last column; but, when I try to regress with billsum_support as the Y variable I get the following error.

reg=PanelOLS(y=df['billsum_support'],x=df[['years_exp', 'unemployment', 'dir_ind_expendituresfor']],time_effects=True)
reg
KeyError                                  Traceback (most recent call last)
/home/jayaramdas/anaconda3/lib/python3.5/site-packages/pandas/indexes/base.py in get_loc(self, key, method, tolerance)
   1875             try:
-> 1876                 return self._engine.get_loc(key)
   1877             except KeyError:

pandas/index.pyx in pandas.index.IndexEngine.get_loc (pandas/index.c:4027)()

pandas/index.pyx in pandas.index.IndexEngine.get_loc (pandas/index.c:3891)()

pandas/hashtable.pyx in pandas.hashtable.PyObjectHashTable.get_item (pandas/hashtable.c:12408)()

pandas/hashtable.pyx in pandas.hashtable.PyObjectHashTable.get_item (pandas/hashtable.c:12359)()

KeyError: 'billsum_support'

I have seen the working example here but this person seems to have their data in stacked format instead of Panel. Is there someone that has some experience with OLS Panel and can understand what I am doing wrong here?


回答1:


I got it; following up on ptrj, and doing some simple exploring I found the solution and will post it in the question

df=df.pivot_table(index='date',columns='cid', fill_value=0,aggfunc=np.mean)

df=df.T.to_panel()

df=df.transpose(2,1,0)

df=df.to_frame()


来源:https://stackoverflow.com/questions/37107796/pandas-trouble-implementing-panel-ols

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!