问题
I want to predict a value at a date in the future with simple linear regression, but I can't due to the date format.
This is the dataframe I have:
data_df =
date value
2016-01-15 1555
2016-01-16 1678
2016-01-17 1789
...
y = np.asarray(data_df['value'])
X = data_df[['date']]
X_train, X_test, y_train, y_test = train_test_split
(X,y,train_size=.7,random_state=42)
model = LinearRegression() #create linear regression object
model.fit(X_train, y_train) #train model on train data
model.score(X_train, y_train) #check score
print (‘Coefficient: \n’, model.coef_)
print (‘Intercept: \n’, model.intercept_)
coefs = zip(model.coef_, X.columns)
model.__dict__
print "sl = %.1f + " % model.intercept_ + \
" + ".join("%.1f %s" % coef for coef in coefs) #linear model
I tried to convert the date unsuccessfully
data_df['conv_date'] = data_df.date.apply(lambda x: x.toordinal())
data_df['conv_date'] = pd.to_datetime(data_df.date, format="%Y-%M-%D")
回答1:
Linear regression doesn't work on date data. Therefore we need to convert it into numerical value.The following code will convert the date into numerical value:
import datetime as dt
data_df['Date'] = pd.to_datetime(data_df['Date'])
data_df['Date']=data_df['Date'].map(dt.datetime.toordinal)
回答2:
convert:
1) date to dataframe index
df = df.set_index('date', append=False)
2) convert datetime object to float64 object
df = df.index.to_julian_date()
run the regression with date being the independent variable.
回答3:
Liner regression works on numerical data. Datetime type is not appropriate for this case. You should remove that column after separating it to three separate columns (year, month and day).
回答4:
When using
dt.datetime.toordinal
be careful that it only converts dates values and does not take into account minutes, seconds etc.. For a complete answer on generating ordinals from full datetime objects you can use something like:
df['Datetime column'],apply(lambda x: time.mktime(x.timetuple()))
来源:https://stackoverflow.com/questions/40217369/python-linear-regression-predict-by-date