Efficient Python Pandas Stock Beta Calculation on Many Dataframes

前端 未结 6 1301
太阳男子
太阳男子 2020-12-07 13:29

I have many (4000+) CSVs of stock data (Date, Open, High, Low, Close) which I import into individual Pandas dataframes to perform analysis. I am new to python and want to c

6条回答
  •  不思量自难忘°
    2020-12-07 14:15

    HERE'S THE SIMPLEST AND FASTEST SOLUTION

    The accepted answer was too slow for what I needed and the I didn't understand the math behind the solutions asserted as faster. They also gave different answers, though in fairness I probably just messed it up.

    I don't think you need to make a custom rolling function to calculate beta with pandas 1.1.4 (or even since at least .19). The below code assumes the data is in the same format as the above problems--a pandas dataframe with a date index, percent returns of some periodicity for the stocks, and market values are located in a column named 'Market'.

    If you don't have this format, I recommend joining the stock returns to the market returns to ensure the same index with:

    # Use .pct_change() only if joining Close data
    beta_data = stock_data.join(market_data), how = 'inner').pct_change().dropna()
    

    After that, it's just covariance divided by variance.

    
    ticker_covariance = beta_data.rolling(window).cov()
    # Limit results to the stock (i.e. column name for the stock) vs. 'Market' covariance
    ticker_covariance = ticker_covariance.loc[pd.IndexSlice[:, stock], 'Market'].dropna()
    benchmark_variance = beta_data['Market'].rolling(window).var().dropna()
    beta = ticker_covariance / benchmark_variance
    

    NOTES: If you have a multi-index, you'll have to drop the non-date levels to use the rolling().apply() solution. I only tested this for one stock and one market. If you have multiple stocks, a modification to the ticker_covariance equation after .loc is probably needed. Last, if you want to calculate beta values for the periods before the full window (ex. stock_data begins 1 year ago, but you use 3yrs of data), then you can modify the above to and expanding (instead of rolling) window with the same calculation and then .combine_first() the two.

提交回复
热议问题