Run an OLS regression with Pandas Data Frame

前端 未结 5 1981
温柔的废话
温柔的废话 2020-11-30 16:48

I have a pandas data frame and I would like to able to predict the values of column A from the values in columns B and C. Here is a toy example:



        
5条回答
  •  独厮守ぢ
    2020-11-30 17:32

    This would require me to reformat the data into lists inside lists, which seems to defeat the purpose of using pandas in the first place.

    No it doesn't, just convert to a NumPy array:

    >>> data = np.asarray(df)
    

    This takes constant time because it just creates a view on your data. Then feed it to scikit-learn:

    >>> from sklearn.linear_model import LinearRegression
    >>> lr = LinearRegression()
    >>> X, y = data[:, 1:], data[:, 0]
    >>> lr.fit(X, y)
    LinearRegression(copy_X=True, fit_intercept=True, normalize=False)
    >>> lr.coef_
    array([  4.01182386e-01,   3.51587361e-04])
    >>> lr.intercept_
    14.952479503953672
    

提交回复
热议问题