Python pandas calculate rolling stock beta using rolling apply to groupby object in vectorized fashion

前端 未结 3 1960
一向
一向 2020-12-17 06:45

I have a large data frame, df, containing 4 columns:

             id           period  ret_1m   mkt_ret_1m
131146       CAN00WG0     199609 -0.1538    0.0471         


        
3条回答
  •  慢半拍i
    慢半拍i (楼主)
    2020-12-17 06:55

    def rolling_apply(df, period, func, min_periods=None):
        if min_periods is None:
            min_periods = period
        result = pd.Series(np.nan, index=df.index)
    
        for i in range(1, len(df)):
            sub_df = df.iloc[max(i-period, 0):i,:] #get a subsample to run
            if len(sub_df) >= min_periods:
                idx = sub_df.index[-1]+1 # mind the forward looking bias,your return in time t should not be inclued in the beta calculating in time t
                result[idx] = func(sub_df)
        return result
    

    I fix a forward looking bias for Happy001's code. It's a finance problem, so it should be cautious.

    I find that vlmercado's answer is so wrong. If you simply use pd.rolling_cov and pd.rolling_var you are making mistakes in finance. Firstly, it's obvious that the second stock CAN00WH0 do not have any NaN beta, since it use the return of CAN00WG0, which is wrong at all. Secondly, consider such a situation: a stock suspended for ten years, and you can also get that sample into your beta calculating.

    I find that pandas.rolling also works for Timestamp, but it seems not ok with groupby. So I change the code of Happy001's code . It's not the fastest way, but is at least 20x faster than the origin code.

    crsp_daily['date']=pd.to_datetime(crsp_daily['date'])
    crsp_daily=crsp_daily.set_index('date') # rolling needs a time serie index
    crsp_daily.index=pd.DatetimeIndex(crsp_daily.index)
    calc=crsp_daily[['permno','ret','mkt_ret']]
    grp = calc.groupby('permno') #rolling beta for each stock
    beta=pd.DataFrame()
    for stock, sub_df in grp:
            sub2_df=sub_df[['ret','mkt_ret']].sort_index() 
            beta_m = sub2_df.rolling('1825d',min_periods=150).cov() # 5yr rolling beta , note that d for day, and you cannot use w/m/y, s/d are availiable.
            beta_m['beta']=beta_m['ret']/beta_m['mkt_ret']
            beta_m=beta_m.xs('mkt_ret',level=1,axis=0)
            beta=beta.append(pd.merge(sub_df,pd.DataFrame(beta_m['beta'])))
    beta=beta.reset_index()
    beta=beta[['date','permno','beta']]
    

提交回复
热议问题