Efficient Python Pandas Stock Beta Calculation on Many Dataframes

前端 未结 6 1293
太阳男子
太阳男子 2020-12-07 13:29

I have many (4000+) CSVs of stock data (Date, Open, High, Low, Close) which I import into individual Pandas dataframes to perform analysis. I am new to python and want to c

6条回答
  •  轻奢々
    轻奢々 (楼主)
    2020-12-07 14:05

    While efficient subdivision of the input data set into rolling windows is important to the optimization of the overall calculations, the performance of the beta calculation itself can also be significantly improved.

    The following optimizes only the subdivision of the data set into rolling windows:

    def numpy_betas(x_name, window, returns_data, intercept=True):
        if intercept:
            ones = numpy.ones(window)
    
        def lstsq_beta(window_data):
            x_data = numpy.vstack([window_data[x_name], ones]).T if intercept else window_data[[x_name]]
            beta_arr, residuals, rank, s = numpy.linalg.lstsq(x_data, window_data)
            return beta_arr[0]
    
        indices = [int(x) for x in numpy.arange(0, returns_data.shape[0] - window + 1, 1)]
        return DataFrame(
            data=[lstsq_beta(returns_data.iloc[i:(i + window)]) for i in indices]
            , columns=list(returns_data.columns)
            , index=returns_data.index[window - 1::1]
        )
    

    The following also optimizes the beta calculation itself:

    def custom_betas(x_name, window, returns_data):
        window_inv = 1.0 / window
        x_sum = returns_data[x_name].rolling(window, min_periods=window).sum()
        y_sum = returns_data.rolling(window, min_periods=window).sum()
        xy_sum = returns_data.mul(returns_data[x_name], axis=0).rolling(window, min_periods=window).sum()
        xx_sum = numpy.square(returns_data[x_name]).rolling(window, min_periods=window).sum()
        xy_cov = xy_sum - window_inv * y_sum.mul(x_sum, axis=0)
        x_var = xx_sum - window_inv * numpy.square(x_sum)
        betas = xy_cov.divide(x_var, axis=0)[window - 1:]
        betas.columns.name = None
        return betas
    

    Comparing the performance of the two different calculations, you can see that as the window used in the beta calculation increases, the second method dramatically outperforms the first:

    Comparing the performance to that of @piRSquared's implementation, the custom method takes roughly 350 millis to evaluate compared to over 2 seconds.

提交回复
热议问题