I am running a rolling for example of 100 window OLS regression estimation
of the dataset found in this link (https://drive.google.com/drive/folders/0B2Iv8dfU4f
Short Answer
The value of r^2
is going to be +/- inf
as long as y
remains constant over the regression window (100 observations in your case). You can find more details below, but intuition is that r^2
is the proportion of y
's variance explained by X
: if y
's variance is zero, r^2
is simply not well defined.
Possible solution: Try to use a longer window, or resample Y and X so that Y does not remain constant for so many consecutive observations.
Long Answer
Looking at this I honestly think this is not the right dataset for the regression. This is a simple plot of the data:
Does a linear combination of X and time explain Y? Mmm...doesn't look plausible. Y almost looks like a discrete variable, so you probably want to look at logistic regressions.
To come to your question, the R^2 is the "the proportion of the variance in the dependent variable that is predictable from the independent variable(s)". From wikipedia:
In your case it is very likely that Y is constant over 100 observations, hence it has 0 variance, that produces a division by zero hence the inf.
So I am afraid you should not look to fixes in the code, but you should rethink the problem and the way of fitting the data.
Ok so I prepared this small example so you can visualize what a Poisson regression could do.
import statsmodels as sm
import matplotlib.pyplot as plt
poi_model = sm.discrete.discrete_model.Poisson
x = np.random.uniform(0, 20,1000)
s = np.random.poisson( x*(0.5) , 1000)
plt.bar(x,s)
plt.show()
This generates random poisson counts.
Now the way to fit a poisson regression to the data is the following:
my_model = poi_model(endog=s, exog=x)
my_model = my_model.fit()
my_model.summary()
The summary displays a number of statistics but if you want to compute the mean square error you could do that like so:
preds = my_model.predict()
mse = np.mean(np.square(preds - s))
If you want to predict new values do the following:
my_model.predict(exog=new_value)