Convert incomplete 12h datetime-like strings into appropriate datetime type

ぐ巨炮叔叔 提交于 2019-12-02 07:53:34

You have data in 5 second intervals throughout multiple days. The desired end format is like this (with AM/PM column we need to add, because Pandas cannot possibly guess, since it looks at one value at a time):

31/12/2016 11:59:55 PM
01/01/2017 12:00:00 AM
01/01/2017 12:00:05 AM
01/01/2017 11:59:55 AM
01/01/2017 12:00:00 PM
01/01/2017 12:59:55 PM
01/01/2017 01:00:00 PM
01/01/2017 01:00:05 PM
01/01/2017 11:59:55 PM
02/01/2017 12:00:00 AM

First, we can parse the whole thing without AM/PM info, as you already showed:

ts = pd.to_datetime(df.TS, format = '%d/%m/%Y %I:%M:%S')

We have a small problem: 12:00:00 is parsed as noon, not midnight. Let's normalize that:

ts[ts.dt.hour == 12] -= pd.Timedelta(12, 'h')

Now we have times from 00:00:00 to 11:59:55, twice per day.

Next, note that the transitions are always at 00:00:00. We can easily detect these, as well as the first instance of each date:

twelve = ts.dt.time == datetime.time(0,0,0)
newdate = ts.dt.date.diff() > pd.Timedelta(0)
midnight = twelve & newdate
noon = twelve & ~newdate

Next, build an offset series, which should be easy to inspect for correctness:

offset = pd.Series(np.nan, ts.index, dtype='timedelta64[ns]')
offset[midnight] = pd.Timedelta(0)
offset[noon] = pd.Timedelta(12, 'h')
offset.fillna(method='ffill', inplace=True)

And finally:

ts += offset
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!