I have a large data frame, df, containing 4 columns:
id period ret_1m mkt_ret_1m
131146 CAN00WG0 199609 -0.1538 0.0471
def rolling_apply(df, period, func, min_periods=None):
if min_periods is None:
min_periods = period
result = pd.Series(np.nan, index=df.index)
for i in range(1, len(df)):
sub_df = df.iloc[max(i-period, 0):i,:] #get a subsample to run
if len(sub_df) >= min_periods:
idx = sub_df.index[-1]+1 # mind the forward looking bias,your return in time t should not be inclued in the beta calculating in time t
result[idx] = func(sub_df)
return result
I fix a forward looking bias for Happy001's code. It's a finance problem, so it should be cautious.
I find that vlmercado's answer is so wrong. If you simply use pd.rolling_cov and pd.rolling_var you are making mistakes in finance. Firstly, it's obvious that the second stock CAN00WH0 do not have any NaN beta, since it use the return of CAN00WG0, which is wrong at all. Secondly, consider such a situation: a stock suspended for ten years, and you can also get that sample into your beta calculating.
I find that pandas.rolling also works for Timestamp, but it seems not ok with groupby. So I change the code of Happy001's code . It's not the fastest way, but is at least 20x faster than the origin code.
crsp_daily['date']=pd.to_datetime(crsp_daily['date'])
crsp_daily=crsp_daily.set_index('date') # rolling needs a time serie index
crsp_daily.index=pd.DatetimeIndex(crsp_daily.index)
calc=crsp_daily[['permno','ret','mkt_ret']]
grp = calc.groupby('permno') #rolling beta for each stock
beta=pd.DataFrame()
for stock, sub_df in grp:
sub2_df=sub_df[['ret','mkt_ret']].sort_index()
beta_m = sub2_df.rolling('1825d',min_periods=150).cov() # 5yr rolling beta , note that d for day, and you cannot use w/m/y, s/d are availiable.
beta_m['beta']=beta_m['ret']/beta_m['mkt_ret']
beta_m=beta_m.xs('mkt_ret',level=1,axis=0)
beta=beta.append(pd.merge(sub_df,pd.DataFrame(beta_m['beta'])))
beta=beta.reset_index()
beta=beta[['date','permno','beta']]