I have many (4000+) CSVs of stock data (Date, Open, High, Low, Close) which I import into individual Pandas dataframes to perform analysis. I am new to python and want to c
but these would be blockish when you require beta calculations across the dates(m) for multiple stocks(n) resulting (m x n) number of calculations.
Some relief could be taken by running each date or stock on multiple cores, but then you will end up having huge hardware.
The major time requirement for the solutions available is finding the variance and co-variance and also NaN should be avoided in (Index and stock) data for a correct calculation as per pandas==0.23.0.
Thus running again would result stupid move unless the calculations are cached.
numpy variance and co-variance version also happens to miss-calculate the beta if NaN are not dropped.
A Cython implementation is must for huge set of data.