Linear Regression Returns Different Results Than Synthetic Parameters

问题

trying this code:

from sklearn import linear_model
import numpy as np

x1 = np.arange(0,10,0.1)
x2 = x1*10

y = 2*x1 + 3*x2
X = np.vstack((x1, x2)).transpose()

reg_model = linear_model.LinearRegression()
reg_model.fit(X,y)

print reg_model.coef_
# should be [2,3]

print reg_model.predict([5,6])
# should be 2*5 + 3*6 = 28 

print reg_model.intercept_
# perfectly at the expected value of 0

print reg_model.score(X,y)
# seems to be rather confident to be right

The results are

[ 0.31683168 3.16831683]
20.5940594059
0.0
1.0

and therefore not what I expected - they are not the same as the parameters used to synthesize the data. Why is this so?

回答1:

Your problem is with the uniqueness of solutions, as both dimensions are the same (applying a linear transform to one dimension does not make unique data in the eyes of this model), you get an infinite number of possible solutions that will fit you data. Applying a non-linear transformation to your second dimension you will see the desired output.

from sklearn import linear_model
import numpy as np

x1 = np.arange(0,10,0.1)
x2 = x1**2
X = np.vstack((x1, x2)).transpose()
y = 2*x1 + 3*x2

reg_model = linear_model.LinearRegression()
reg_model.fit(X,y)
print reg_model.coef_
# should be [2,3]

print reg_model.predict([[5,6]])
# should be 2*5 + 3*6 = 28 

print reg_model.intercept_
# perfectly at the expected value of 0

print reg_model.score(X,y)

Outputs are

[ 2. 3.]
[ 28.]
-2.84217094304e-14
1.0

来源：https://stackoverflow.com/questions/36131047/linear-regression-returns-different-results-than-synthetic-parameters

标签

python

scikit-learn

linear-regression