Pandas Merge on Name and Closest Date

后端 未结 3 686
北恋
北恋 2020-12-08 12:00

I am trying to merge two dataframes on both name and the closest date (WRT the left hand dataframe). In my research I found one similar question here but it doesn\'t account

3条回答
  •  伪装坚强ぢ
    2020-12-08 12:36

    A small addition to the hernamesbarbara's code

    def find_closest_date(timepoint, time_series, add_time_delta_column=True, mode="abs"):
        """takes a pd.Timestamp() instance and a pd.Series with dates in it
        calcs the delta between `timepoint` and each date in `time_series`
        returns the closest date and optionally the number of days in its time delta
    
        Parameters
        ----------
        mode: "abs" (default), "left", "right"
            closest datetime by abs, at left, at right
    
        References
        ----------
        .. [1] http://stackoverflow.com/a/25962323/716469
        """
        deltas = time_series - timepoint
    
        idx_closest_date = None
        if mode == "abs":
            idx_closest_date = np.argmin(abs(deltas))
        elif mode == "left":
            deltas_ = deltas[deltas <= pd.Timedelta('0 days 00:00:00.0')]
            if len(deltas_):
                idx_closest_date = np.argmax(deltas_)
        elif mode == "right":
            deltas_ = deltas[deltas >= pd.Timedelta('0 days 00:00:00.0')]
            if len(deltas_):
                idx_closest_date = np.argmin(deltas_)
        else:
            raise Exception("Mode is incorrect")
    
        if idx_closest_date is not None:
            closest_date = time_series.ix[idx_closest_date]
            if add_time_delta_column:
                closest_delta = deltas[idx_closest_date]
        else:
            closest_date = pd.NaT
            if add_time_delta_column:
                closest_delta = pd.Timedelta(pd.NaT)
    
        res = {"closest_date": closest_date}
        idx = ['closest_date']
        if add_time_delta_column:
            res["closest_delta"] = closest_delta
            idx.append('closest_delta')
    
        return pd.Series(res, index=idx)
    

提交回复
热议问题