Missing intercepts of OLS Regression models in Python statsmodels

前端 未结 2 1383
悲哀的现实
悲哀的现实 2020-12-12 05:52

I am running a rolling for example of 100 window OLS regression estimation of the dataset found in this link (https://drive.google.com/drive/folders/0B2Iv8dfU4f

相关标签:
2条回答
  • 2020-12-12 06:22

    Short Answer

    The value of r^2 is going to be +/- inf as long as y remains constant over the regression window (100 observations in your case). You can find more details below, but intuition is that r^2 is the proportion of y's variance explained by X: if y's variance is zero, r^2 is simply not well defined.

    Possible solution: Try to use a longer window, or resample Y and X so that Y does not remain constant for so many consecutive observations.

    Long Answer

    Looking at this I honestly think this is not the right dataset for the regression. This is a simple plot of the data:

    Does a linear combination of X and time explain Y? Mmm...doesn't look plausible. Y almost looks like a discrete variable, so you probably want to look at logistic regressions.

    To come to your question, the R^2 is the "the proportion of the variance in the dependent variable that is predictable from the independent variable(s)". From wikipedia:

    In your case it is very likely that Y is constant over 100 observations, hence it has 0 variance, that produces a division by zero hence the inf.

    So I am afraid you should not look to fixes in the code, but you should rethink the problem and the way of fitting the data.

    0 讨论(0)
  • 2020-12-12 06:31

    Ok so I prepared this small example so you can visualize what a Poisson regression could do.

    import statsmodels as sm
    import matplotlib.pyplot as plt
    poi_model = sm.discrete.discrete_model.Poisson
    
    x = np.random.uniform(0, 20,1000)
    s = np.random.poisson( x*(0.5) , 1000)
    plt.bar(x,s)
    plt.show()
    

    This generates random poisson counts.

    Now the way to fit a poisson regression to the data is the following:

    my_model = poi_model(endog=s, exog=x)
    my_model = my_model.fit()
    my_model.summary()
    

    The summary displays a number of statistics but if you want to compute the mean square error you could do that like so:

    preds = my_model.predict()
    mse = np.mean(np.square(preds - s))
    

    If you want to predict new values do the following:

    my_model.predict(exog=new_value)
    
    0 讨论(0)
提交回复
热议问题