Efficient Python Pandas Stock Beta Calculation on Many Dataframes

前端 未结 6 1296
太阳男子
太阳男子 2020-12-07 13:29

I have many (4000+) CSVs of stock data (Date, Open, High, Low, Close) which I import into individual Pandas dataframes to perform analysis. I am new to python and want to c

6条回答
  •  轮回少年
    2020-12-07 14:01

    Further optimizing on @piRSquared's implementation for both speed and memory. the code is also simplified for clarity.

    from numpy import nan, ndarray, ones_like, vstack, random
    from numpy.lib.stride_tricks import as_strided
    from numpy.linalg import pinv
    from pandas import DataFrame, date_range
    
    def calc_beta(s: ndarray, m: ndarray):
      x = vstack((ones_like(m), m))
      b = pinv(x.dot(x.T)).dot(x).dot(s)
      return b[1]
    
    def rolling_calc_beta(s_df: DataFrame, m_df: DataFrame, period: int):
      result = ndarray(shape=s_df.shape, dtype=float)
      l, w = s_df.shape
      ls, ws = s_df.values.strides
      result[0:period - 1, :] = nan
      s_arr = as_strided(s_df.values, shape=(l - period + 1, period, w), strides=(ls, ls, ws))
      m_arr = as_strided(m_df.values, shape=(l - period + 1, period), strides=(ls, ls))
      for row in range(period, l):
        result[row, :] = calc_beta(s_arr[row - period, :], m_arr[row - period])
      return DataFrame(data=result, index=s_df.index, columns=s_df.columns)
    
    if __name__ == '__main__':
      num_sec_dfs, num_periods = 4000, 480
    
      dates = date_range('1995-12-31', periods=num_periods, freq='M', name='Date')
      stocks = DataFrame(data=random.rand(num_periods, num_sec_dfs), index=dates,
                       columns=['s{:04d}'.format(i) for i in 
                                range(num_sec_dfs)]).pct_change()
      market = DataFrame(data=random.rand(num_periods), index=dates, columns= 
                  ['Market']).pct_change()
      betas = rolling_calc_beta(stocks, market, 12)
    

    %timeit betas = rolling_calc_beta(stocks, market, 12)

    335 ms ± 2.69 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

提交回复
热议问题