sklearn: Found arrays with inconsistent numbers of samples when calling LinearRegression.fit()

匿名 (未验证) 提交于 2019-12-03 03:03:02

问题:

Just trying to do a simple linear regression but I'm baffled by this error for:

regr = LinearRegression() regr.fit(df2.iloc[1:1000, 5].values, df2.iloc[1:1000, 2].values) 

which produces:

ValueError: Found arrays with inconsistent numbers of samples: [  1 999] 

These selections must have the same dimensions, and they should be numpy arrays, so what am I missing?

回答1:

It looks like sklearn requires the data shape of (row number, column number). If your data shape is (row number, ) like (999, ), it does not work. By using numpy.reshape(), you should change to (999, 1), e.g. using

data.reshape((999,1)) 

In my case, it worked with that.



回答2:

Looks like you are using pandas dataframe (from the name df2).

You could also do the following:

regr = LinearRegression() regr.fit(df2.iloc[1:1000, 5].to_frame(), df2.iloc[1:1000, 2].to_frame()) 

NOTE: I have removed "values" as that converts the pandas Series to numpy.ndarray and numpy.ndarray does not have attribute to_frame().



回答3:

I think the "X" argument of regr.fit needs to be a matrix, so the following should work.

regr = LinearRegression() regr.fit(df2.iloc[1:1000, [5]].values, df2.iloc[1:1000, 2].values) 


回答4:

I encountered this error because I converted my data to an np.array. I fixed the problem by converting my data to an np.matrix instead and taking the transpose.

ValueError: regr.fit(np.array(x_list), np.array(y_list))

Correct: regr.fit(np.transpose(np.matrix(x_list)), np.transpose(np.matrix(y_list)))



回答5:

expects X(feature matrix) 

Try to put your features in a tuple like this:

features = ['TV', 'Radio', 'Newspaper'] X = data[features]


回答6:

Seen on the Udacity deep learning foundation course:

df = pd.read_csv('my.csv') ... regr = LinearRegression() regr.fit(df[['column x']], df[['column y']]) 


回答7:

As it was mentioned above X argument must be a matrix or a numpy array with known dimensions. So you could probably use this:

df2.iloc[1:1000, 5:some_last_index].values 

So your dataframe would be converted to an array with known dimensions and you won't need to reshape it



回答8:

To analyze two arrays (array1 and array2) they need to meet the following two requirements:

1) They need to be a numpy.ndarray

Check with

type(array1) # and type(array2) 

If that is not the case for at least one of them perform

array1 = numpy.ndarray(array1) # or array2 = numpy.ndarray(array2) 

2) The dimensions need to be as follows:

array1.shape #shall give (N, 1) array2.shape #shall give (N,) 

N is the number of items that are in the array. To provide array1 with the right number of axes perform:

array1 = array1[:, numpy.newaxis] 


标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!