Merging dataframes based on date range

前端未结

关注

 2  1741

醉梦人生 2020-12-09 23:32

I have two pandas dataframes: one (df1) with three columns (StartDate, EndDate, and ID) and a second (df2) w

2条回答

庸人自扰 (楼主)

2020-12-10 00:07

Minor correction to @JianxunLi answer. Bit too involved for a comment.

This uses the len(funclist) == len(condlist) + 1 property of piecewise to assign a default value for when there is no match. Otherwise the default no-match value is zero, which can cause problems...

### Data / inits
import pandas as pd
import numpy as np

df1 = pd.DataFrame({'StartDate': pd.date_range('2010-01-01', periods=9, freq='5D'), 'EndDate': pd.date_range('2010-01-04', periods=9, freq='5D'), 'ID': np.arange(1, 10, 1)})
df2 = pd.DataFrame(dict(values=np.random.randn(50), date_time=pd.date_range('2010-01-01', periods=50, freq='D')))

### Processing
valIfNoMatch = np.nan
df2['ID_matched'] = np.piecewise(np.zeros(len(df2)),\
                                     [(df2.date_time.values >= start_date)&(df2.date_time.values < end_date) for start_date, end_date in zip(df1.StartDate.values, df1.EndDate.values)],\
                                     np.append(df1.ID.values, valIfNoMatch))

PS. Also corrected the typo testing both >= & <=; a timestamp on an exact boundary between intervals would return true for two different intervals, which breaks a key assumption of the method.

0 讨论(0)

查看其它2个回答