Turning a Pandas Dataframe to an array and evaluate Multiple Linear Regression Model

后端 未结 2 657
一向
一向 2020-12-18 07:36

I am trying to evaluate a multiple linear regression model. I have a data set like this :

\"enter

2条回答
  •  星月不相逢
    2020-12-18 08:25

    You can turn the dataframe into a matrix using the method as_matrix directly on the dataframe object. You might need to specify the columns which you are interested in X=df[['x1','x2','X3']].as_matrix() where the different x's are the column names.

    For the y variables you can use y = df['ground_truth'].values to get an array.

    Here is an example with some randomly generated data:

    import numpy as np
    #create a 5X5 dataframe
    df = pd.DataFrame(np.random.random_integers(0, 100, (5, 5)), columns = ['X1','X2','X3','X4','y'])
    

    calling as_matrix() on df returns a numpy.ndarray object

    X = df[['X1','X2','X3','X4']].as_matrix()
    

    Calling values returns a numpy.ndarray from a pandas series

    y =df['y'].values
    

    Notice: You might get a warning saying:FutureWarning: Method .as_matrix will be removed in a future version. Use .values instead.

    To fix it use values instead of as_matrix as shown below

    X = df[['X1','X2','X3','X4']].values
    

提交回复
热议问题