Deprecated rolling window option in OLS from Pandas to Statsmodels

前端 未结 4 2030
轻奢々
轻奢々 2020-12-07 17:16

as the title suggests, where has the rolling function option in the ols command in Pandas migrated to in statsmodels? I can\'t seem to find it. Pandas tells me doom is in th

4条回答
  •  爱一瞬间的悲伤
    2020-12-07 17:37

    I created an ols module designed to mimic pandas' deprecated MovingOLS; it is here.

    It has three core classes:

    • OLS : static (single-window) ordinary least-squares regression. The output are NumPy arrays
    • RollingOLS : rolling (multi-window) ordinary least-squares regression. The output are higher-dimension NumPy arrays.
    • PandasRollingOLS : wraps the results of RollingOLS in pandas Series & DataFrames. Designed to mimic the look of the deprecated pandas module.

    Note that the module is part of a package (which I'm currently in the process of uploading to PyPi) and it requires one inter-package import.

    The first two classes above are implemented entirely in NumPy and primarily use matrix algebra. RollingOLS takes advantage of broadcasting extensively also. Attributes largely mimic statsmodels' OLS RegressionResultsWrapper.

    An example:

    import urllib.parse
    import pandas as pd
    from pyfinance.ols import PandasRollingOLS
    
    # You can also do this with pandas-datareader; here's the hard way
    url = "https://fred.stlouisfed.org/graph/fredgraph.csv"
    
    syms = {
        "TWEXBMTH" : "usd", 
        "T10Y2YM" : "term_spread", 
        "GOLDAMGBD228NLBM" : "gold",
    }
    
    params = {
        "fq": "Monthly,Monthly,Monthly",
        "id": ",".join(syms.keys()),
        "cosd": "2000-01-01",
        "coed": "2019-02-01",
    }
    
    data = pd.read_csv(
        url + "?" + urllib.parse.urlencode(params, safe=","),
        na_values={"."},
        parse_dates=["DATE"],
        index_col=0
    ).pct_change().dropna().rename(columns=syms)
    print(data.head())
    #                  usd  term_spread      gold
    # DATE                                       
    # 2000-02-01  0.012580    -1.409091  0.057152
    # 2000-03-01 -0.000113     2.000000 -0.047034
    # 2000-04-01  0.005634     0.518519 -0.023520
    # 2000-05-01  0.022017    -0.097561 -0.016675
    # 2000-06-01 -0.010116     0.027027  0.036599
    
    y = data.usd
    x = data.drop('usd', axis=1)
    
    window = 12  # months
    model = PandasRollingOLS(y=y, x=x, window=window)
    
    print(model.beta.head())  # Coefficients excluding the intercept
    #             term_spread      gold
    # DATE                             
    # 2001-01-01     0.000033 -0.054261
    # 2001-02-01     0.000277 -0.188556
    # 2001-03-01     0.002432 -0.294865
    # 2001-04-01     0.002796 -0.334880
    # 2001-05-01     0.002448 -0.241902
    
    print(model.fstat.head())
    # DATE
    # 2001-01-01    0.136991
    # 2001-02-01    1.233794
    # 2001-03-01    3.053000
    # 2001-04-01    3.997486
    # 2001-05-01    3.855118
    # Name: fstat, dtype: float64
    
    print(model.rsq.head())  # R-squared
    # DATE
    # 2001-01-01    0.029543
    # 2001-02-01    0.215179
    # 2001-03-01    0.404210
    # 2001-04-01    0.470432
    # 2001-05-01    0.461408
    # Name: rsq, dtype: float64
    

提交回复
热议问题