Merging dataframes based on date range

前端 未结 2 1741
醉梦人生
醉梦人生 2020-12-09 23:32

I have two pandas dataframes: one (df1) with three columns (StartDate, EndDate, and ID) and a second (df2) w

2条回答
  •  庸人自扰
    2020-12-10 00:07

    Minor correction to @JianxunLi answer. Bit too involved for a comment.

    This uses the len(funclist) == len(condlist) + 1 property of piecewise to assign a default value for when there is no match. Otherwise the default no-match value is zero, which can cause problems...

    ### Data / inits
    import pandas as pd
    import numpy as np
    
    df1 = pd.DataFrame({'StartDate': pd.date_range('2010-01-01', periods=9, freq='5D'), 'EndDate': pd.date_range('2010-01-04', periods=9, freq='5D'), 'ID': np.arange(1, 10, 1)})
    df2 = pd.DataFrame(dict(values=np.random.randn(50), date_time=pd.date_range('2010-01-01', periods=50, freq='D')))
    
    ### Processing
    valIfNoMatch = np.nan
    df2['ID_matched'] = np.piecewise(np.zeros(len(df2)),\
                                         [(df2.date_time.values >= start_date)&(df2.date_time.values < end_date) for start_date, end_date in zip(df1.StartDate.values, df1.EndDate.values)],\
                                         np.append(df1.ID.values, valIfNoMatch))
    

    PS. Also corrected the typo testing both >= & <=; a timestamp on an exact boundary between intervals would return true for two different intervals, which breaks a key assumption of the method.

提交回复
热议问题