how does sklearn do Linear regression when p >n?

落花浮王杯 提交于 2019-11-30 16:10:06
eickenberg

When the linear system is underdetermined, then the sklearn.linear_model.LinearRegression finds the minimum L2 norm solution, i.e.

argmin_w l2_norm(w) subject to Xw = y

This is always well defined and obtainable by applying the pseudoinverse of X to y, i.e.

w = np.linalg.pinv(X).dot(y)

The specific implementation of scipy.linalg.lstsq, which is used by LinearRegression uses get_lapack_funcs(('gelss',), ... which is precisely a solver that finds the minimum norm solution via singular value decomposition (provided by LAPACK).

Check out this example

import numpy as np
rng = np.random.RandomState(42)
X = rng.randn(5, 10)
y = rng.randn(5)

from sklearn.linear_model import LinearRegression
lr = LinearRegression(fit_intercept=False)
coef1 = lr.fit(X, y).coef_
coef2 = np.linalg.pinv(X).dot(y)

print(coef1)
print(coef2)

And you will see that coef1 == coef2. (Note that fit_intercept=False is specified in the constructor of the sklearn estimator, because otherwise it would subtract the mean of each feature before fitting the model, yielding different coefficients)

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!