问题
Suppose I have a pandas dataframe with a column whose values are datetime64[ns]
.
Out[204]:
0 2015-03-20 00:00:28
1 2015-03-20 00:01:44
2 2015-03-20 00:02:55
3 2015-03-20 00:03:39
4 2015-03-20 00:04:32
5 2015-03-20 00:05:52
6 2015-03-20 00:06:36
7 2015-03-20 00:07:44
8 2015-03-20 00:08:56
9 2015-03-20 00:09:47
Name: DateTime, dtype: datetime64[ns]
Is there any easy way to convert them the nearest minute after the time? i.e. I want the following:
Out[204]:
0 2015-03-20 00:01:00
1 2015-03-20 00:02:00
2 2015-03-20 00:03:00
3 2015-03-20 00:04:00
4 2015-03-20 00:05:00
5 2015-03-20 00:06:00
6 2015-03-20 00:07:00
7 2015-03-20 00:08:00
8 2015-03-20 00:09:00
9 2015-03-20 00:10:00
Name: DateTime, dtype: datetime64[ns]
I wrote a complicate code that first converts them to string and then extracts the three portions of 00:09:47
, convert them into integers, then unless the last portion (seconds) is already 00
, I make the last portion (seconds) to be 00
, adds 1
to the middle portion (minutes) except if the middle portion (minutes) is already 59
in which case it adds to the first portion (hours). Then recombine the new integers back to a string and then reconstruct back the DateTime
.
But I was thinking that may there might be already an existing simpler solution. Would anyone have any suggestions?
* EDIT *
@Jeff, @unutbu, thanks for your answers. I can only select one answer in SO, but both work.
回答1:
Given a DataFrame with a column of dtype datetime64[ns]
, you could
use
df['date'] += np.array(-df['date'].dt.second % 60, dtype='<m8[s]')
to add the appropriate number of seconds to obtain the ceiling.
For example,
import io
import sys
import numpy as np
import pandas as pd
StringIO = io.BytesIO if sys.version < '3' else io.StringIO
df = '''\
2015-03-20 00:00:00
2015-03-20 00:00:28
2015-03-20 00:01:44
2015-03-20 00:02:55
2015-03-20 00:03:39
2015-03-20 00:04:32
2015-03-20 00:05:52
2015-03-20 00:06:36
2015-03-20 00:07:44
2015-03-20 00:08:56
2015-03-20 00:09:47'''
df = pd.read_table(StringIO(df), sep='\s{2,}',
header=None, parse_dates=[0], names=['date'])
df['date'] += np.array(-df['date'].dt.second % 60, dtype='<m8[s]')
print(df)
yields
date
0 2015-03-20 00:00:00
1 2015-03-20 00:01:00
2 2015-03-20 00:02:00
3 2015-03-20 00:03:00
4 2015-03-20 00:04:00
5 2015-03-20 00:05:00
6 2015-03-20 00:06:00
7 2015-03-20 00:07:00
8 2015-03-20 00:08:00
9 2015-03-20 00:09:00
10 2015-03-20 00:10:00
回答2:
Here's another way. Subtract off the differential seconds (sort of like round). This is vectorized.
In [46]: df.date+pd.to_timedelta(-df.date.dt.second % 60,unit='s')
Out[46]:
0 2015-03-20 00:01:00
1 2015-03-20 00:02:00
2 2015-03-20 00:03:00
3 2015-03-20 00:04:00
4 2015-03-20 00:05:00
5 2015-03-20 00:06:00
6 2015-03-20 00:07:00
7 2015-03-20 00:08:00
8 2015-03-20 00:09:00
9 2015-03-20 00:10:00
dtype: datetime64[ns
Here's another way. Changing something to a Period of another frequency rounds it. (Note that this is a bit clunky ATM because Periods are not full-fledged as a column type). This is vectorized.
In [48]: pd.Series(pd.PeriodIndex(df.date.dt.to_period('T')+1).to_timestamp())
Out[48]:
0 2015-03-20 00:01:00
1 2015-03-20 00:02:00
2 2015-03-20 00:03:00
3 2015-03-20 00:04:00
4 2015-03-20 00:05:00
5 2015-03-20 00:06:00
6 2015-03-20 00:07:00
7 2015-03-20 00:08:00
8 2015-03-20 00:09:00
9 2015-03-20 00:10:00
dtype: datetime64[ns]
This last method will always round 'up' as we are incrementing the floored period.
回答3:
Now a built-in method ceil()
is available in pandas for this. For a Series of datetime it can be accessed using Series.dt.ceil()
:
In[92]: t
Out[92]:
0 2015-03-20 00:00:28
1 2015-03-20 00:01:44
2 2015-03-20 00:02:55
3 2015-03-20 00:03:39
4 2015-03-20 00:04:32
5 2015-03-20 00:05:52
6 2015-03-20 00:06:36
7 2015-03-20 00:07:44
8 2015-03-20 00:08:56
9 2015-03-20 00:09:47
dtype: datetime64[ns]
In[93]: t.dt.ceil('min')
Out[93]:
0 2015-03-20 00:01:00
1 2015-03-20 00:02:00
2 2015-03-20 00:03:00
3 2015-03-20 00:04:00
4 2015-03-20 00:05:00
5 2015-03-20 00:06:00
6 2015-03-20 00:07:00
7 2015-03-20 00:08:00
8 2015-03-20 00:09:00
9 2015-03-20 00:10:00
dtype: datetime64[ns]
ceil()
accepts frequency parameter. String aliases for it are listed here.
回答4:
I think it might need a little bit of work, but I think this is roughly what you're after (I'm sure there's a way to use .snap
or an offsets .rollforward
, but can't seem to get those to work):
ps = pd.Series([
datetime(2015, 1, 1, 19, 18, 34), # roll up min, reset sec
datetime(2015, 1, 1, 1, 1, 1), # roll up min, reset sec
datetime(2015, 1, 1, 0, 0, 0), # do nothing
datetime(2015, 1, 1, 23, 59, 1), # roll day/hr/min, reset sec
datetime(2015, 1, 31, 23, 59, 1), # roll mth/day/hr/min, reset sec
datetime(2015, 12, 31, 23, 59, 1) # roll yr/month/day/hr/min - reset sec
])
ps[ps.dt.second != 0] = ps.apply(lambda L: (L + timedelta(minutes=1)).replace(second=0))
Which gives you:
0 2015-01-01 19:19:00
1 2015-01-01 01:02:00
2 2015-01-01 00:00:00
3 2015-01-02 00:00:00
4 2015-02-01 00:00:00
5 2016-01-01 00:00:00
来源:https://stackoverflow.com/questions/29177656/how-to-apply-ceiling-to-pandas-datetime