Given two dataframes df_1 and df_2, how to join them such that datetime column df_1 is in between start and end
In this method, we assume TimeStamp objects are used.
df2 start end event
0 2016-05-14 10:54:31 2016-05-14 10:54:33 E1
1 2016-05-14 10:54:34 2016-05-14 10:54:37 E2
2 2016-05-14 10:54:38 2016-05-14 10:54:42 E3
event_num = len(df2.event)
def get_event(t):
event_idx = ((t >= df2.start) & (t <= df2.end)).dot(np.arange(event_num))
return df2.event[event_idx]
df1["event"] = df1.timestamp.transform(get_event)
Explanation of get_event
For each timestamp in df1, say t0 = 2016-05-14 10:54:33,
(t0 >= df2.start) & (t0 <= df2.end) will contain 1 true. (See example 1). Then, take a dot product with np.arange(event_num) to get the index of the event that a t0 belongs to.
Examples:
Example 1
t0 >= df2.start t0 <= df2.end After & np.arange(3)
0 True True -> T 0 event_idx
1 False True -> F 1 -> 0
2 False True -> F 2
Take t2 = 2016-05-14 10:54:35 for another example
t2 >= df2.start t2 <= df2.end After & np.arange(3)
0 True False -> F 0 event_idx
1 True True -> T 1 -> 1
2 False True -> F 2
We finally use transform to transform each timestamp into an event.