Pandas Merge on Name and Closest Date

后端 未结 3 703
北恋
北恋 2020-12-08 12:00

I am trying to merge two dataframes on both name and the closest date (WRT the left hand dataframe). In my research I found one similar question here but it doesn\'t account

3条回答
  •  盖世英雄少女心
    2020-12-08 12:35

    This is super late, but hopefully its helpful for new answer seekers. I answered a similar question here

    with a somewhat new method in pandas:

    pandas.merge_asof()

    The parameters of interest for you would be direction,tolerance,left_on, and right_on

    Building off @hernamesbarbara answer & data:

    data

    a = """timepoint,measure
    2014-01-01 00:00:00,78
    2014-01-02 00:00:00,29
    2014-01-03 00:00:00,5
    2014-01-04 00:00:00,73
    2014-01-05 00:00:00,40
    2014-01-06 00:00:00,45
    2014-01-07 00:00:00,48
    2014-01-08 00:00:00,2
    2014-01-09 00:00:00,96
    2014-01-10 00:00:00,82
    2014-01-11 00:00:00,61
    2014-01-12 00:00:00,68
    2014-01-13 00:00:00,8
    2014-01-14 00:00:00,94
    2014-01-15 00:00:00,16
    2014-01-16 00:00:00,31
    2014-01-17 00:00:00,10
    2014-01-18 00:00:00,34
    2014-01-19 00:00:00,27
    2014-01-20 00:00:00,58
    2014-01-21 00:00:00,90
    2014-01-22 00:00:00,41
    2014-01-23 00:00:00,97
    2014-01-24 00:00:00,7
    2014-01-25 00:00:00,86
    2014-01-26 00:00:00,62
    2014-01-27 00:00:00,91
    2014-01-28 00:00:00,0
    2014-01-29 00:00:00,73
    2014-01-30 00:00:00,22
    2014-01-31 00:00:00,43
    2014-02-01 00:00:00,87
    2014-02-02 00:00:00,56
    2014-02-03 00:00:00,45
    2014-02-04 00:00:00,25
    2014-02-05 00:00:00,92
    2014-02-06 00:00:00,83
    2014-02-07 00:00:00,13
    2014-02-08 00:00:00,50
    2014-02-09 00:00:00,48
    2014-02-10 00:00:00,78"""
    
    b = """timepoint,measure
    2014-01-01 00:00:00,78
    2014-01-08 00:00:00,29
    2014-01-15 00:00:00,5
    2014-01-22 00:00:00,73
    2014-01-29 00:00:00,40
    2014-02-05 00:00:00,45
    2014-02-12 00:00:00,48
    2014-02-19 00:00:00,2
    2014-02-26 00:00:00,96
    2014-03-05 00:00:00,82
    2014-03-12 00:00:00,61
    2014-03-19 00:00:00,68
    2014-03-26 00:00:00,8
    2014-04-02 00:00:00,94
    """
    

    solution

    import pandas as pd
    from pandas import read_csv
    from io import StringIO
    
    df1 = pd.read_csv(StringIO(a), parse_dates=['timepoint'])
    df2 = pd.read_csv(StringIO(b), parse_dates=['timepoint'])
    
    
    
    df1['timepoint'] = pd.to_datetime(df1['timepoint'])
    df2['timepoint'] = pd.to_datetime(df2['timepoint'])
    
    # converting this to the index so we can preserve the date_start_time columns so you can validate the merging logic
    df1.index = df1['timepoint']
    df2.index = df2['timepoint']
    # the magic happens below, check the direction and tolerance arguments
    # if you want you can make a maximum tolerance on which to merge data
    tol = pd.Timedelta('3 day')
    df3 = pd.merge_asof(left=df1,right=df2,right_index=True,left_index=True,direction='nearest',tolerance=tol)
    

    output

    df3.head()
    
        timepoint_x measure_x   timepoint_y measure_y
    timepoint               
    2014-01-01  2014-01-01  78  2014-01-01  78
    2014-01-02  2014-01-02  29  2014-01-01  78
    2014-01-03  2014-01-03  5   2014-01-01  78
    2014-01-04  2014-01-04  73  2014-01-01  78
    2014-01-05  2014-01-05  40  2014-01-08  29
    

提交回复
热议问题