问题
trying this code:
from sklearn import linear_model
import numpy as np
x1 = np.arange(0,10,0.1)
x2 = x1*10
y = 2*x1 + 3*x2
X = np.vstack((x1, x2)).transpose()
reg_model = linear_model.LinearRegression()
reg_model.fit(X,y)
print reg_model.coef_
# should be [2,3]
print reg_model.predict([5,6])
# should be 2*5 + 3*6 = 28
print reg_model.intercept_
# perfectly at the expected value of 0
print reg_model.score(X,y)
# seems to be rather confident to be right
The results are
- [ 0.31683168 3.16831683]
- 20.5940594059
- 0.0
- 1.0
and therefore not what I expected - they are not the same as the parameters used to synthesize the data. Why is this so?
回答1:
Your problem is with the uniqueness of solutions, as both dimensions are the same (applying a linear transform to one dimension does not make unique data in the eyes of this model), you get an infinite number of possible solutions that will fit you data. Applying a non-linear transformation to your second dimension you will see the desired output.
from sklearn import linear_model
import numpy as np
x1 = np.arange(0,10,0.1)
x2 = x1**2
X = np.vstack((x1, x2)).transpose()
y = 2*x1 + 3*x2
reg_model = linear_model.LinearRegression()
reg_model.fit(X,y)
print reg_model.coef_
# should be [2,3]
print reg_model.predict([[5,6]])
# should be 2*5 + 3*6 = 28
print reg_model.intercept_
# perfectly at the expected value of 0
print reg_model.score(X,y)
Outputs are
[ 2. 3.]
[ 28.]
-2.84217094304e-14
1.0
来源:https://stackoverflow.com/questions/36131047/linear-regression-returns-different-results-than-synthetic-parameters