Deprecated rolling window option in OLS from Pandas to Statsmodels

前端 未结 4 2029
轻奢々
轻奢々 2020-12-07 17:16

as the title suggests, where has the rolling function option in the ols command in Pandas migrated to in statsmodels? I can\'t seem to find it. Pandas tells me doom is in th

4条回答
  •  执念已碎
    2020-12-07 17:42

    Adding for completeness a speedier numpy-only solution which limits calculations only to the regression coefficients and the final estimate

    Numpy rolling regression function

    import numpy as np
    
    def rolling_regression(y, x, window=60):
        """ 
        y and x must be pandas.Series
        """
    # === Clean-up ============================================================
        x = x.dropna()
        y = y.dropna()
    # === Trim acc to shortest ================================================
        if x.index.size > y.index.size:
            x = x[y.index]
        else:
            y = y[x.index]
    # === Verify enough space =================================================
        if x.index.size < window:
            return None
        else:
        # === Add a constant if needed ========================================
            X = x.to_frame()
            X['c'] = 1
        # === Loop... this can be improved ====================================
            estimate_data = []
            for i in range(window, x.index.size+1):
                X_slice = X.values[i-window:i,:] # always index in np as opposed to pandas, much faster
                y_slice = y.values[i-window:i]
                coeff = np.dot(np.dot(np.linalg.inv(np.dot(X_slice.T, X_slice)), X_slice.T), y_slice)
                estimate_data.append(coeff[0] * x.values[window-1] + coeff[1])
        # === Assemble ========================================================
            estimate = pandas.Series(data=estimate_data, index=x.index[window-1:]) 
            return estimate             
    

    Notes

    In some specific case uses, which only require the final estimate of the regression, x.rolling(window=60).apply(my_ols) appears to be somewhat slow

    As a reminder, the coefficients for a regression can be calculated as a matrix product, as you can read on wikipedia's least squares page. This approach via numpy's matrix multiplication can speed up the process somewhat vs using the ols in statsmodels. This product is expressed in the line starting as coeff = ...

提交回复
热议问题