linearRegression() returns list within list (sklearn)

守給你的承諾、 提交于 2019-12-21 17:38:46

问题


I'm doing multivariate linear regression in Python (sklearn), but for some reason, the coefficients are not correctly returned as a list. Instead, a list IN A LIST is returned:

from sklearn import linear_model
clf = linear_model.LinearRegression()
# clf.fit ([[0, 0, 0], [1, 1, 1], [2, 2, 2]], [0, 1, 2])
clf.fit([[394, 3878, 13, 4, 0, 0],[384, 10175, 14, 4, 0, 0]],[3,9])
print 'coef array',clf.coef_
print 'length', len(clf.coef_)
print 'getting value 0:', clf.coef_[0]
print 'getting value 1:', clf.coef_[1]

This returns the values in a list of a list [[]] instead of a list []. Any idea why this is happening? Output:

coef array [[  1.03428648e-03   9.54477167e-04   1.45135995e-07   0.00000000e+00
0.00000000e+00   0.00000000e+00]]
length 1
getting value 0: [  1.03428648e-03   9.54477167e-04   1.45135995e-07   0.0000000
0e+00 0.00000000e+00   0.00000000e+00]
getting value 1:
Traceback (most recent call last):
  File "regress.py", line 8, in <module>
    print 'getting value 1:', clf.coef_[1]
IndexError: index out of bounds

But this works:

from sklearn import linear_model
clf = linear_model.LinearRegression()
clf.fit ([[0, 0, 0], [1, 1, 1], [2, 2, 2]], [0, 1, 2])
# clf.fit([[394, 3878, 13, 4, 0, 0],[384, 10175, 14, 4, 0, 0]],[3,9])
print 'coef array',clf.coef_
print 'length', len(clf.coef_)
print 'getting value 0:', clf.coef_[0]
print 'getting value 1:', clf.coef_[1]

Output:

coef array [ 0.33333333  0.33333333  0.33333333]
length 3
getting value 0: 0.333333333333
getting value 1: 0.333333333333

回答1:


Seems like an issue with scipy.linalg. If you trace the call chain it goes first in https://github.com/scikit-learn/scikit-learn/blob/master/sklearn/linear_model/base.py#L218 and then it reaches the if statement at https://github.com/scipy/scipy/blob/master/scipy/linalg/basic.py#L468. That if differentiates your two test cases. In the first case m,n=2,6 and in the second you have m,n=3,3.




回答2:


This is fixed by updating two files in the SciKit-Learn folder.

The code is here: https://github.com/scikit-learn/scikit-learn/commit/d0b20f0a21ba42b85375b1fbc7202dc3962ae54f




回答3:


I have never used the module for multivariate linear regression that you are referring to, so I cannot know why it is happening. But if you just want to solve you problem, you can flatten the list:

flat_list = clf.coef_[0]

If the list may have more than one sublist (and you want to combine them all into a flat list), then you can use a more general way to flatten it:

flat_list = [item for sublist in clf.coef_ for item in sublist]

EDIT: While waiting from a real explanation/solution from the package's developers, you could rely on a solution like this:

if isinstance(clf.coef_[0], list):
    clf.coef_ = clf.coef_[0]

That flattens the list only if there is a sublist inside of it.




回答4:


This really isn't a valid question about the Python language; it should be a question to the developers of sklearn. But... if you know that is the format your data will be returned in, you could just:

print 'getting value 0:', clf.coef_[0][0]
print 'getting value 1:', clf.coef_[0][1]
                                   ^^^ 


来源:https://stackoverflow.com/questions/11549486/linearregression-returns-list-within-list-sklearn

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!