Deprecated rolling window option in OLS from Pandas to Statsmodels

前端未结

关注

 4  2029

轻奢々 2020-12-07 17:16

as the title suggests, where has the rolling function option in the ols command in Pandas migrated to in statsmodels? I can\'t seem to find it. Pandas tells me doom is in th

4条回答

执念已碎 (楼主)

2020-12-07 17:42

Adding for completeness a speedier numpy-only solution which limits calculations only to the regression coefficients and the final estimate

Numpy rolling regression function

import numpy as np

def rolling_regression(y, x, window=60):
    """ 
    y and x must be pandas.Series
    """
# === Clean-up ============================================================
    x = x.dropna()
    y = y.dropna()
# === Trim acc to shortest ================================================
    if x.index.size > y.index.size:
        x = x[y.index]
    else:
        y = y[x.index]
# === Verify enough space =================================================
    if x.index.size < window:
        return None
    else:
    # === Add a constant if needed ========================================
        X = x.to_frame()
        X['c'] = 1
    # === Loop... this can be improved ====================================
        estimate_data = []
        for i in range(window, x.index.size+1):
            X_slice = X.values[i-window:i,:] # always index in np as opposed to pandas, much faster
            y_slice = y.values[i-window:i]
            coeff = np.dot(np.dot(np.linalg.inv(np.dot(X_slice.T, X_slice)), X_slice.T), y_slice)
            estimate_data.append(coeff[0] * x.values[window-1] + coeff[1])
    # === Assemble ========================================================
        estimate = pandas.Series(data=estimate_data, index=x.index[window-1:]) 
        return estimate

Notes

In some specific case uses, which only require the final estimate of the regression, x.rolling(window=60).apply(my_ols) appears to be somewhat slow

As a reminder, the coefficients for a regression can be calculated as a matrix product, as you can read on wikipedia's least squares page. This approach via numpy's matrix multiplication can speed up the process somewhat vs using the ols in statsmodels. This product is expressed in the line starting as coeff = ...

0 讨论(0)

查看其它4个回答