Efficiently check if value is present in any of given ranges

前端 未结 2 950
猫巷女王i
猫巷女王i 2021-01-06 10:27

I have two pandas DataFrame objects:

  • A contains \'start\' and \'finish\' columns

  • B has c

2条回答
  •  夕颜
    夕颜 (楼主)
    2021-01-06 11:26

    You can do it with a O(n) complexity. The idea is to transform the representation. In A, you store one row per interval. I would suggest a dataframe which stores one row per transition (ie entering an interval, leaving an interval).

    A = pd.DataFrame(
        data={
            'start': [1, 50, 30],
            'finish': [3, 83, 42]    
        }
    )
    
    starts = pd.DataFrame(data={'start': 1}, index=A.start.tolist())
    finishs = pd.DataFrame(data={'finish': -1}, index=A.finish.tolist())
    transitions = pd.merge(starts, finishs, how='outer', left_index=True, right_index=True).fillna(0)
    transitions
    
        start  finish
    1       1       0
    3       0      -1
    30      1       0
    42      0      -1
    50      1       0
    83      0      -1
    

    this dataframe stores per date the type of transitions. Now, we need to know at each date if we are in an interval or not. It looks like counting the opening & closing parenthesis. You can do:

    transitions['transition'] = (transitions.pop('finish') + transitions.pop('start')).cumsum()
    transitions
    
        transition
    1            1
    3            0
    30           1
    42           0
    50           1
    83           0
    

    Here it says:

    • At 1, i'm in an interval
    • At 3, i'm not
    • In general, if the value is strictly greater than 0, it's in an interval.
    • Note that this handles overlapping interval

    And now you merge with your B dataframe:

    B = pd.DataFrame(
        index=[31, 20, 2.5, 84, 1000]
    )
    
    pd.merge(transitions, B, how='outer', left_index=True, right_index=True).fillna(method='ffill').loc[B.index].astype(bool)
    
           transition
    31.0         True
    20.0        False
    2.5          True
    84.0        False
    1000.0      False
    

提交回复
热议问题