I\'m analyzing a time series, and based on certain criteria, I can pick out rows that are either the start or the end of the events. At thi
you can achieve this by just looking at cumulative summation of number of event start
and number of event end
:
>>> data['event number'] = (data.event == 'event start').cumsum()
>>> data
event event number
2010-01-01 00:20:00 event start 1
2010-01-01 00:30:00 -- 1
2010-01-01 00:40:00 -- 1
2010-01-01 00:50:00 -- 1
2010-01-01 01:00:00 -- 1
2010-01-01 01:10:00 event end 1
2010-01-01 01:20:00 -- 1
2010-01-01 02:20:00 -- 1
2010-01-01 02:30:00 event start 2
2010-01-01 02:40:00 -- 2
2010-01-01 02:50:00 -- 2
2010-01-01 03:00:00 -- 2
2010-01-01 03:10:00 -- 2
2010-01-01 03:20:00 -- 2
2010-01-01 03:30:00 event end 2
now you just need to set to nan
when there is no event; but those places corresponds to rows where cumulative summation of event start
is equal to cumulative summation of event end
(with shifting 1 row)
>>> idx = data['event number'] == (data.event.shift(1) == 'event end').cumsum()
>>> data.loc[idx, 'event number'] = np.nan
>>> data
event event number
2010-01-01 00:20:00 event start 1
2010-01-01 00:30:00 -- 1
2010-01-01 00:40:00 -- 1
2010-01-01 00:50:00 -- 1
2010-01-01 01:00:00 -- 1
2010-01-01 01:10:00 event end 1
2010-01-01 01:20:00 -- NaN
2010-01-01 02:20:00 -- NaN
2010-01-01 02:30:00 event start 2
2010-01-01 02:40:00 -- 2
2010-01-01 02:50:00 -- 2
2010-01-01 03:00:00 -- 2
2010-01-01 03:10:00 -- 2
2010-01-01 03:20:00 -- 2
2010-01-01 03:30:00 event end 2
[15 rows x 2 columns]