发表新帖

发表新帖

how does sklearn do Linear regression when p >n?

前端未结

关注

 1  1231

遇见更好的自我

it\'s known that when the number of variables (p) is larger than the number of samples (n) the least square estimator is not defined.

In sklearn I receive this value

相关标签:

1条回答

鱼传尺愫

2020-12-10 00:12
When the linear system is underdetermined, then the sklearn.linear_model.LinearRegression finds the minimum L2 norm solution, i.e.
```
argmin_w l2_norm(w) subject to Xw = y
```
This is always well defined and obtainable by applying the pseudoinverse of X to y, i.e.
```
w = np.linalg.pinv(X).dot(y)
```
The specific implementation of scipy.linalg.lstsq, which is used by LinearRegression uses get_lapack_funcs(('gelss',), ... which is precisely a solver that finds the minimum norm solution via singular value decomposition (provided by LAPACK).

Check out this example
```
import numpy as np
rng = np.random.RandomState(42)
X = rng.randn(5, 10)
y = rng.randn(5)

from sklearn.linear_model import LinearRegression
lr = LinearRegression(fit_intercept=False)
coef1 = lr.fit(X, y).coef_
coef2 = np.linalg.pinv(X).dot(y)

print(coef1)
print(coef2)
```
And you will see that coef1 == coef2. (Note that fit_intercept=False is specified in the constructor of the sklearn estimator, because otherwise it would subtract the mean of each feature before fitting the model, yielding different coefficients)
0 讨论(0)
发布评论:

提交评论
- 加载中...

热议问题