Efficient Python Pandas Stock Beta Calculation on Many Dataframes

前端未结

关注

 6  1293

太阳男子 2020-12-07 13:29

I have many (4000+) CSVs of stock data (Date, Open, High, Low, Close) which I import into individual Pandas dataframes to perform analysis. I am new to python and want to c

6条回答

轻奢々 (楼主)

2020-12-07 14:05

While efficient subdivision of the input data set into rolling windows is important to the optimization of the overall calculations, the performance of the beta calculation itself can also be significantly improved.

The following optimizes only the subdivision of the data set into rolling windows:

def numpy_betas(x_name, window, returns_data, intercept=True):
    if intercept:
        ones = numpy.ones(window)

    def lstsq_beta(window_data):
        x_data = numpy.vstack([window_data[x_name], ones]).T if intercept else window_data[[x_name]]
        beta_arr, residuals, rank, s = numpy.linalg.lstsq(x_data, window_data)
        return beta_arr[0]

    indices = [int(x) for x in numpy.arange(0, returns_data.shape[0] - window + 1, 1)]
    return DataFrame(
        data=[lstsq_beta(returns_data.iloc[i:(i + window)]) for i in indices]
        , columns=list(returns_data.columns)
        , index=returns_data.index[window - 1::1]
    )

The following also optimizes the beta calculation itself:

def custom_betas(x_name, window, returns_data):
    window_inv = 1.0 / window
    x_sum = returns_data[x_name].rolling(window, min_periods=window).sum()
    y_sum = returns_data.rolling(window, min_periods=window).sum()
    xy_sum = returns_data.mul(returns_data[x_name], axis=0).rolling(window, min_periods=window).sum()
    xx_sum = numpy.square(returns_data[x_name]).rolling(window, min_periods=window).sum()
    xy_cov = xy_sum - window_inv * y_sum.mul(x_sum, axis=0)
    x_var = xx_sum - window_inv * numpy.square(x_sum)
    betas = xy_cov.divide(x_var, axis=0)[window - 1:]
    betas.columns.name = None
    return betas

Comparing the performance of the two different calculations, you can see that as the window used in the beta calculation increases, the second method dramatically outperforms the first:

Comparing the performance to that of @piRSquared's implementation, the custom method takes roughly 350 millis to evaluate compared to over 2 seconds.

0 讨论(0)

查看其它6个回答