I have a DataFrame
with column named date
. How can we convert/parse the \'date\' column to a DateTime
object?
I loaded the dat
pandas already reads that as a datetime
object! So what you want is to select rows between two dates and you can do that by masking:
df_masked = df[(df.date > '2012-04-01') & (df.date < '2012-04-04')]
Because you said that you were getting an error from the string for some reason, try this:
df_masked = df[(df.date > datetime.date(2012,4,1)) & (df.date < datetime.date(2012,4,4))]
datetime.date
with Pandas pd.Timestamp
A "Pandas datetime
series" contains pd.Timestamp
elements, not datetime.date
elements. The recommended solution for Pandas:
s = pd.to_datetime(s) # convert series to Pandas
mask = s > '2018-03-10' # calculate Boolean mask against Pandas-compatible object
The top answers have issues:
TypeError
.Any good Pandas solution must ensure:
datetime
series, not object
dtype.datetime
series is compared to a compatible object, e.g. pd.Timestamp
, or string in the correct format.Here's a demo with benchmarking, demonstrating that the one-off cost of conversion can be immediately offset by a single operation:
from datetime import date
L = [date(2018, 1, 10), date(2018, 5, 20), date(2018, 10, 30), date(2018, 11, 11)]
s = pd.Series(L*10**5)
a = s > date(2018, 3, 10) # accepted solution #2, inefficient
b = pd.to_datetime(s) > '2018-03-10' # more efficient, including datetime conversion
assert a.equals(b) # check solutions give same result
%timeit s > date(2018, 3, 10) # 40.5 ms
%timeit pd.to_datetime(s) > '2018-03-10' # 33.7 ms
s = pd.to_datetime(s)
%timeit s > '2018-03-10' # 2.85 ms
You should iterate over the items and parse them independently, then construct a new list.
df['date'] = [dateutil.parser.parse(x) for x in df['date']]
Pandas is aware of the object datetime but when you use some of the import functions it is taken as a string. So what you need to do is make sure the column is set as the datetime type not as a string. Then you can make your query.
df['date'] = pd.to_datetime(df['date'])
df_masked = df[(df['date'] > datetime.date(2012,4,1)) & (df['date'] < datetime.date(2012,4,4))]
You probably need apply
, so something like:
df['date'] = df['date'].apply(dateutil.parser.parse)
Without an example of the column I can't guarantee this will work, but something in that direction should help you to carry on.