How to get the prediction of test from 2D parameters of WLS regression in statsmodels

问题

I'm incrementally up the parameters of WLS regression functions using statsmodels.

I have a 10x3 dataset X that I declared like this:

X = np.array([[1,2,3],[1,2,3],[4,5,6],[1,2,3],[4,5,6],[1,2,3],[1,2,3],[4,5,6],[4,5,6],[1,2,3]])

This is my dataset, and I have a 10x2 endog vector that looks like this:

z =
[[  3.90311860e-322   2.00000000e+000]
 [  0.00000000e+000   2.00000000e+000]
 [  0.00000000e+000  -2.00000000e+000]
 [  0.00000000e+000   2.00000000e+000]
 [  0.00000000e+000  -2.00000000e+000]
 [  0.00000000e+000   2.00000000e+000]
 [  0.00000000e+000   2.00000000e+000]
 [  0.00000000e+000  -2.00000000e+000]
 [  0.00000000e+000  -2.00000000e+000]
 [  0.00000000e+000   2.00000000e+000]]

Now after importing import statsmodels.api as sm I do this:

g = np.zeros([3, 2]) # g(x) is a function that will store the regression parameters
mod_wls = sm.WLS(z, X)
temp_g = mod_wls.fit()
print temp_g.params

And I get this output:

[[ -5.92878775e-323  -2.77777778e+000]
 [ -4.94065646e-324  -4.44444444e-001]
 [  4.94065646e-323   1.88888889e+000]]

Earlier, from the answer to this question, I was able to predict the value of test data X_test using numpy.dot, like this:

np.dot(X_test, temp_g.params)

I understood that easily since it the endog vector, y was a 1D array. But how does it work when my endog vector, in this case, z, is 2D? When I try the above line as was used in the 1D version, I get the following error:

   self._check_integrity()
  File "C:\Users\app\Anaconda\lib\site-packages\statsmodels\base\data.py", line 247, in _check_integrity
    raise ValueError("endog and exog matrices are different sizes")
ValueError: endog and exog matrices are different sizes

回答1:

np.dot(X_test, temp_g.params) should still work.

In some cases you need to check what the orientation of the matrices are, sometimes it's necessary to transpose

However predict and most other methods of the results will not work, because the model assumes that dependent variable, z, is 1D.

The question is again what you are trying to do?

If you want to independently fit columns of z, then iterate over it so each y is 1D.

for y in z.T: res = WLS(y, X).fit()

z.T allows iteration over columns.

In other cases, we usually stack the model so that y is 1D and first part of it is z[:,0] and the second part of the column is z[:,1]. The design matrix or matrix of explanatory variables has to be expanded correspondingly.

Support for multivariate dependent variables is in the making for statsmodels but will still take some time to be ready.

来源：https://stackoverflow.com/questions/23369859/how-to-get-the-prediction-of-test-from-2d-parameters-of-wls-regression-in-statsm

标签

python

arrays

numpy

regression

statsmodels