I have a Pandas DataFrame with a date
column (eg: 2013-04-01
) of dtype datetime.date
. When I include that column in X_train
You have two options. You can convert the date to an ordinal i.e. an integer representing the number of days since year 1 day 1. You can do this by a datetime.date
's toordinal
function.
Alternatively, you can turn the dates into categorical variables using sklearn's OneHotEncoder. What it does is create a new variable for each distinct date. So instead of something like column date
with values ['2013-04-01', '2013-05-01']
, you will have two columns, date_2013_04_01
with values [1, 0]
and date_2013_05_01
with values [0, 1]
.
I would recommend using the toordinal
approach if you have many different dates, and the one hot encoder if the number of distinct dates is small (let's say up to 10 - 100, depending on the size of your data and what sort of relation the date has with the output variable).