Efficiently check if value is present in any of given ranges

前端 未结 2 942
猫巷女王i
猫巷女王i 2021-01-06 10:27

I have two pandas DataFrame objects:

  • A contains \'start\' and \'finish\' columns

  • B has c

2条回答
  •  旧时难觅i
    2021-01-06 11:10

    IIUC you want the output to be True if there is at least one interval in which the date is?

    Is an apply(lambda) efficient enough for you? (It might be a little long for a big dataframe as it iterates over the rows of B). If it is, you can try this:

    def in_range(date,start,finish):
        return (True in ((start < date) & (date < finish)).unique())
    
    B.date.apply(lambda x: in_range(x,A.start,A.finish))
    

    Output:

    0     True
    1    False
    2     True
    3    False
    4    False
    

    EDIT: MaxU's answer works better in fact. Here are the timers for 10 000 rows dataframes (A and B):

    %timeit B2.date.apply(lambda x: in_range(x,A2.start,A2.finish))
    1 loop, best of 3: 9.82 s per loop
    
    %timeit B2.date.apply(lambda x: ((x >= A2.start) & (x <= A2.finish)).any())
    1 loop, best of 3: 7.31 s per loop
    

提交回复
热议问题