I have a pandas
data frame and I would like to able to predict the values of column A from the values in columns B and C. Here is a toy example:
This would require me to reformat the data into lists inside lists, which seems to defeat the purpose of using pandas in the first place.
No it doesn't, just convert to a NumPy array:
>>> data = np.asarray(df)
This takes constant time because it just creates a view on your data. Then feed it to scikit-learn:
>>> from sklearn.linear_model import LinearRegression
>>> lr = LinearRegression()
>>> X, y = data[:, 1:], data[:, 0]
>>> lr.fit(X, y)
LinearRegression(copy_X=True, fit_intercept=True, normalize=False)
>>> lr.coef_
array([ 4.01182386e-01, 3.51587361e-04])
>>> lr.intercept_
14.952479503953672