I have two pandas DataFrame objects:
A
contains \'start\'
and \'finish\'
columns
B
has c
IIUC you want the output to be True
if there is at least one interval in which the date is?
Is an apply(lambda)
efficient enough for you? (It might be a little long for a big dataframe as it iterates over the rows of B
). If it is, you can try this:
def in_range(date,start,finish):
return (True in ((start < date) & (date < finish)).unique())
B.date.apply(lambda x: in_range(x,A.start,A.finish))
Output:
0 True
1 False
2 True
3 False
4 False
EDIT: MaxU's answer works better in fact. Here are the timers for 10 000 rows dataframes (A and B):
%timeit B2.date.apply(lambda x: in_range(x,A2.start,A2.finish))
1 loop, best of 3: 9.82 s per loop
%timeit B2.date.apply(lambda x: ((x >= A2.start) & (x <= A2.finish)).any())
1 loop, best of 3: 7.31 s per loop
You can do it with a O(n) complexity. The idea is to transform the representation. In A, you store one row per interval. I would suggest a dataframe which stores one row per transition (ie entering an interval, leaving an interval).
A = pd.DataFrame(
data={
'start': [1, 50, 30],
'finish': [3, 83, 42]
}
)
starts = pd.DataFrame(data={'start': 1}, index=A.start.tolist())
finishs = pd.DataFrame(data={'finish': -1}, index=A.finish.tolist())
transitions = pd.merge(starts, finishs, how='outer', left_index=True, right_index=True).fillna(0)
transitions
start finish
1 1 0
3 0 -1
30 1 0
42 0 -1
50 1 0
83 0 -1
this dataframe stores per date the type of transitions. Now, we need to know at each date if we are in an interval or not. It looks like counting the opening & closing parenthesis. You can do:
transitions['transition'] = (transitions.pop('finish') + transitions.pop('start')).cumsum()
transitions
transition
1 1
3 0
30 1
42 0
50 1
83 0
Here it says:
And now you merge with your B dataframe:
B = pd.DataFrame(
index=[31, 20, 2.5, 84, 1000]
)
pd.merge(transitions, B, how='outer', left_index=True, right_index=True).fillna(method='ffill').loc[B.index].astype(bool)
transition
31.0 True
20.0 False
2.5 True
84.0 False
1000.0 False