pandas merge dataframes by closest time

后端 未结 2 1788
渐次进展
渐次进展 2021-01-05 08:03

I\'ve got two dataframes (logs and failures), which I would like to merge so that I add in logs a column which has the value of the cl

2条回答
  •  無奈伤痛
    2021-01-05 08:29

    In Pandas >= 0.19.0 you can now use pandas.merge_asof to get near matches. With 0.19 you're limited to taking the most recent failure value before or at the log value. However with 0.20 you can get the nearest in either direction.

    Perform an asof merge. This is similar to a left-join except that we match on nearest key rather than equal keys.

    For each row in the left DataFrame, we select the last row in the right DataFrame whose ‘on’ key is less than or equal to the left’s key. Both DataFrames must be sorted by the key.

    In [3]: failures.sort_values("date", inplace=True)
    
    In [6]: logs2=pd.DataFrame({'date-time':pd.Series(['23/10/2015 10:20:54','22/10/2015 09:51:32','21/10/2015 06:51:32','28/10/2015 16:59:32','25/10/2015 04:41:32','24/10/2015 11:50
       ...: :11', "20/10/2015 01:02:03"]),'var1':pd.Series([0,1,3,1,2,4, 99])})
       ...: 
    
    In [7]: logs2['date-time']=pd.to_datetime(logs2['date-time'])
    
    In [8]: logs2.sort_values("date-time", inplace=True)
    
    In [9]: logs2
    Out[9]: 
                date-time  var1
    6 2015-10-20 01:02:03    99
    2 2015-10-21 06:51:32     3
    1 2015-10-22 09:51:32     1
    0 2015-10-23 10:20:54     0
    5 2015-10-24 11:50:11     4
    4 2015-10-25 04:41:32     2
    3 2015-10-28 16:59:32     1
    
    In [10]: pd.merge_asof(logs2, failures, left_on="date-time", right_on="date")
    Out[10]: 
                date-time  var1       date  failure
    0 2015-10-20 01:02:03    99        NaT      NaN
    1 2015-10-21 06:51:32     3 2015-10-21      1.0
    2 2015-10-22 09:51:32     1 2015-10-22      1.0
    3 2015-10-23 10:20:54     0 2015-10-23      1.0
    4 2015-10-24 11:50:11     4 2015-10-23      1.0
    5 2015-10-25 04:41:32     2 2015-10-23      1.0
    6 2015-10-28 16:59:32     1 2015-10-23      1.0
    
    In [11]: pd.merge_asof(logs2, failures, left_on="date-time", right_on="date", direction="nearest")
    Out[11]: 
                date-time  var1       date  failure
    0 2015-10-20 01:02:03    99 2015-10-21        1
    1 2015-10-21 06:51:32     3 2015-10-21        1
    2 2015-10-22 09:51:32     1 2015-10-22        1
    3 2015-10-23 10:20:54     0 2015-10-23        1
    4 2015-10-24 11:50:11     4 2015-10-23        1
    5 2015-10-25 04:41:32     2 2015-10-23        1
    6 2015-10-28 16:59:32     1 2015-10-23        1
    

提交回复
热议问题