Parse a Pandas column to Datetime when importing table from SQL database and filtering rows by date

后端未结

关注

 5  865

执念已碎

I have a DataFrame with column named date. How can we convert/parse the \'date\' column to a DateTime object?

I loaded the dat

相关标签:

5条回答

情话喂你

2020-12-01 09:36
pandas already reads that as a datetime object! So what you want is to select rows between two dates and you can do that by masking:
```
df_masked = df[(df.date > '2012-04-01') & (df.date < '2012-04-04')]
```
Because you said that you were getting an error from the string for some reason, try this:
```
df_masked = df[(df.date > datetime.date(2012,4,1)) & (df.date < datetime.date(2012,4,4))]
```
0 讨论(0)
发布评论:

提交评论
- 加载中...
鱼传尺愫

2020-12-01 09:36
Don't confuse datetime.date with Pandas pd.Timestamp

A "Pandas datetime series" contains pd.Timestamp elements, not datetime.date elements. The recommended solution for Pandas:
```
s = pd.to_datetime(s)    # convert series to Pandas
mask = s > '2018-03-10'  # calculate Boolean mask against Pandas-compatible object
```
The top answers have issues:
- @RyanSaxe's accepted answer's first attempt doesn't work; the second answer is inefficient.
- As of Pandas v0.23.0, @Keith's highly upvoted answer doesn't work; it gives TypeError.
Any good Pandas solution must ensure:
1. The series is a Pandas datetime series, not object dtype.
2. The datetime series is compared to a compatible object, e.g. pd.Timestamp, or string in the correct format.
Here's a demo with benchmarking, demonstrating that the one-off cost of conversion can be immediately offset by a single operation:
```
from datetime import date

L = [date(2018, 1, 10), date(2018, 5, 20), date(2018, 10, 30), date(2018, 11, 11)]
s = pd.Series(L*10**5)

a = s > date(2018, 3, 10)             # accepted solution #2, inefficient
b = pd.to_datetime(s) > '2018-03-10'  # more efficient, including datetime conversion

assert a.equals(b)                    # check solutions give same result

%timeit s > date(2018, 3, 10)                  # 40.5 ms
%timeit pd.to_datetime(s) > '2018-03-10'       # 33.7 ms

s = pd.to_datetime(s)

%timeit s > '2018-03-10'                       # 2.85 ms
```
0 讨论(0)
发布评论:

提交评论
- 加载中...
闹比i

2020-12-01 09:50
You should iterate over the items and parse them independently, then construct a new list.
```
df['date'] = [dateutil.parser.parse(x) for x in df['date']]
```
0 讨论(0)
发布评论:

提交评论
- 加载中...
暗喜

2020-12-01 09:56
Pandas is aware of the object datetime but when you use some of the import functions it is taken as a string. So what you need to do is make sure the column is set as the datetime type not as a string. Then you can make your query.
```
df['date']  = pd.to_datetime(df['date'])
df_masked = df[(df['date'] > datetime.date(2012,4,1)) & (df['date'] < datetime.date(2012,4,4))]
```
0 讨论(0)
发布评论:

提交评论
- 加载中...
离开以前

2020-12-01 09:59
You probably need apply, so something like:
```
df['date'] = df['date'].apply(dateutil.parser.parse)
```
Without an example of the column I can't guarantee this will work, but something in that direction should help you to carry on.
0 讨论(0)
发布评论:

提交评论
- 加载中...

Parse a Pandas column to Datetime when importing table from SQL database and filtering rows by date

Don't confuse datetime.date with Pandas pd.Timestamp

Don't confuse `datetime.date` with Pandas `pd.Timestamp`