I want to predict a value at a date in the future with simple linear regression, but I can\'t due to the date format.
This is the dataframe I have:
It is really important to differentiate the data types that you want to use for regression/classification.
When you are using time series, that is another case but if you want to use time data as a numerical data type as your input, then you should transform your data type from datetime to float (if your data_df['conv_date]
is a datetime object, if not then you should first transform it by using; data_df['conv_date'] = pd.to_datetime(data_df.date, format="%Y-%M-%D")
)
I agree with Thomas Vetterli's answer. It is useful to be careful what kind of time data you are using.
If you are only using year and month data then dt.datetime.toordinal
would be enough to use;
>>import datetime
>>data_df['conv_date'] = pd.to_datetime(data_df.date, format="%Y-%M-%D")
>>data_df['conv_date'] = data_df['conv_date'].map(datetime.datetime.toordinal)
737577
But if you want to use also the hour, minute and second information then time.mktime()
suits better;
>>import time
>>data_df['conv_date'] = pd.to_datetime(data_df.date, format="%Y-%M-%D")
>>data_df['conv_date'] = data_df['conv_date'].apply(lambda var: time.mktime(var.timetuple()))
1591016041.0
Also 1591016044.0 is another exemplary output from my data, it varies with changes in seconds.
convert:
1) date to dataframe index
df = df.set_index('date', append=False)
2) convert datetime object to float64 object
df = df.index.to_julian_date()
run the regression with date being the independent variable.
Liner regression works on numerical data. Datetime type is not appropriate for this case. You should remove that column after separating it to three separate columns (year, month and day).
Linear regression doesn't work on date data. Therefore we need to convert it into numerical value.The following code will convert the date into numerical value:
import datetime as dt
data_df['Date'] = pd.to_datetime(data_df['Date'])
data_df['Date']=data_df['Date'].map(dt.datetime.toordinal)
When using
dt.datetime.toordinal
be careful that it only converts dates values and does not take into account minutes, seconds etc.. For a complete answer on generating ordinals from full datetime objects you can use something like:
df['Datetime column'].apply(lambda x: time.mktime(x.timetuple()))