Efficient Python Pandas Stock Beta Calculation on Many Dataframes

前端未结

关注

 6  1296

太阳男子 2020-12-07 13:29

I have many (4000+) CSVs of stock data (Date, Open, High, Low, Close) which I import into individual Pandas dataframes to perform analysis. I am new to python and want to c

6条回答

轮回少年 (楼主)

2020-12-07 14:01

Further optimizing on @piRSquared's implementation for both speed and memory. the code is also simplified for clarity.

from numpy import nan, ndarray, ones_like, vstack, random
from numpy.lib.stride_tricks import as_strided
from numpy.linalg import pinv
from pandas import DataFrame, date_range

def calc_beta(s: ndarray, m: ndarray):
  x = vstack((ones_like(m), m))
  b = pinv(x.dot(x.T)).dot(x).dot(s)
  return b[1]

def rolling_calc_beta(s_df: DataFrame, m_df: DataFrame, period: int):
  result = ndarray(shape=s_df.shape, dtype=float)
  l, w = s_df.shape
  ls, ws = s_df.values.strides
  result[0:period - 1, :] = nan
  s_arr = as_strided(s_df.values, shape=(l - period + 1, period, w), strides=(ls, ls, ws))
  m_arr = as_strided(m_df.values, shape=(l - period + 1, period), strides=(ls, ls))
  for row in range(period, l):
    result[row, :] = calc_beta(s_arr[row - period, :], m_arr[row - period])
  return DataFrame(data=result, index=s_df.index, columns=s_df.columns)

if __name__ == '__main__':
  num_sec_dfs, num_periods = 4000, 480

  dates = date_range('1995-12-31', periods=num_periods, freq='M', name='Date')
  stocks = DataFrame(data=random.rand(num_periods, num_sec_dfs), index=dates,
                   columns=['s{:04d}'.format(i) for i in 
                            range(num_sec_dfs)]).pct_change()
  market = DataFrame(data=random.rand(num_periods), index=dates, columns= 
              ['Market']).pct_change()
  betas = rolling_calc_beta(stocks, market, 12)

%timeit betas = rolling_calc_beta(stocks, market, 12)

335 ms ± 2.69 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

0 讨论(0)

查看其它6个回答