Linear Regression on Pandas DataFrame using Sklearn ( IndexError: tuple index out of range)

后端 未结 5 2211
孤街浪徒
孤街浪徒 2020-12-08 15:14

I\'m new to Python and trying to perform linear regression using sklearn on a pandas dataframe. This is what I did:

data = pd.read_csv(\'xxxx.csv\')
<         


        
相关标签:
5条回答
  • 2020-12-08 15:36

    I post an answer that addresses exactly the error that you got:

    IndexError: tuple index out of range

    Scikit-learn expects 2D inputs. Just reshape the X and Y.

    Replace:

    X=data['c1'].values # this  has shape (XXX, ) - It's 1D
    Y=data['c2'].values # this  has shape (XXX, ) - It's 1D
    linear_model.LinearRegression().fit(X,Y)
    

    with

    X=data['c1'].values.reshape(-1,1) # this  has shape (XXX, 1) - it's 2D
    Y=data['c2'].values.reshape(-1,1) # this  has shape (XXX, 1) - it's 2D
    linear_model.LinearRegression().fit(X,Y)
    
    0 讨论(0)
  • 2020-12-08 15:43

    You really should have a look at the docs for the fit method which you can view here

    For how to visualize a linear regression, play with the example here. I'm guessing you haven't used ipython (Now called jupyter) much either, so you should definitely invest some time into learning that. It's a great tool for exploring data and machine learning. You can literally copy/paste the example from scikit linear regression into an ipython notebook and run it

    For your specific problem with the fit method, by referring to the docs, you can see that the format of the data you are passing in for your X values is wrong.

    Per the docs, "X : numpy array or sparse matrix of shape [n_samples,n_features]"

    You can fix your code with this

    X = [[x] for x in data['c1'].values]
    
    0 讨论(0)
  • 2020-12-08 15:49

    Let's assume your csv looks something like:

    c1,c2
    0.000000,0.968012
    1.000000,2.712641
    2.000000,11.958873
    3.000000,10.889784
    ...
    

    I generated the data as such:

    import numpy as np
    from sklearn import datasets, linear_model
    import matplotlib.pyplot as plt
    
    length = 10
    x = np.arange(length, dtype=float).reshape((length, 1))
    y = x + (np.random.rand(length)*10).reshape((length, 1))
    

    This data is saved to test.csv (just so you know where it came from, obviously you'll use your own).

    data = pd.read_csv('test.csv', index_col=False, header=0)
    x = data.c1.values
    y = data.c2.values
    print x # prints: [ 0.  1.  2.  3.  4.  5.  6.  7.  8.  9.]
    

    You need to take a look at the shape of the data you are feeding into .fit().

    Here x.shape = (10,) but we need it to be (10, 1), see sklearn. Same goes for y. So we reshape:

    x = x.reshape(length, 1)
    y = y.reshape(length, 1)
    

    Now we create the regression object and then call fit():

    regr = linear_model.LinearRegression()
    regr.fit(x, y)
    
    # plot it as in the example at http://scikit-learn.org/
    plt.scatter(x, y,  color='black')
    plt.plot(x, regr.predict(x), color='blue', linewidth=3)
    plt.xticks(())
    plt.yticks(())
    plt.show()
    

    See sklearn linear regression example. enter image description here

    0 讨论(0)
  • 2020-12-08 15:51

    make predictions based on the result?

    To predict,

    lr = linear_model.LinearRegression().fit(X,Y)
    lr.predict(X)
    

    Is there any way I can view details of the regression?

    The LinearRegression has coef_ and intercept_ attributes.

    lr.coef_
    lr.intercept_
    

    show the slope and intercept.

    0 讨论(0)
  • 2020-12-08 15:58

    Dataset

    Importing the libraries

    import numpy as np
    import matplotlib.pyplot as plt
    import pandas as pd
    from sklearn.linear_model import LinearRegression
    

    Importing the dataset

    dataset = pd.read_csv('1.csv')
    X = dataset[["mark1"]]
    y = dataset[["mark2"]]
    

    Fitting Simple Linear Regression to the set

    regressor = LinearRegression()
    regressor.fit(X, y)
    

    Predicting the set results

    y_pred = regressor.predict(X)
    

    Visualising the set results

    plt.scatter(X, y, color = 'red')
    plt.plot(X, regressor.predict(X), color = 'blue')
    plt.title('mark1 vs mark2')
    plt.xlabel('mark1')
    plt.ylabel('mark2')
    plt.show()
    

    0 讨论(0)
提交回复
热议问题