Pandas - Date ranges that doesn't overlap

十年热恋 提交于 2021-02-11 13:50:54

问题


I'm getting lost in trying to find a easy way to determine when date ranges from 2 data frame are not overlapping.

I have 2 dataframes :

df1 = pd.DataFrame({
    'START':['2019-01-01 09:00:00', '2019-01-01 18:00:00'],
    'END':['2019-01-01 16:00:00', '2019-01-02 02:00:00']})
df2 = pd.DataFrame({
    'START':['2019-01-01 08:00:00', '2019-01-01 14:00:00', '2019-01-01 22:00:00', '2019-01-02 01:00:00'],
    'END':['2019-01-01 11:00:00', '2019-01-01 15:00:00', '2019-01-01 23:00:00', '2019-01-02 04:00:00']})

df1 :
                 START                  END
0  2019-01-01 09:00:00  2019-01-01 16:00:00
1  2019-01-01 18:00:00  2019-01-02 02:00:00

df2 :
                 START                  END
0  2019-01-01 08:00:00  2019-01-01 11:00:00
1  2019-01-01 14:00:00  2019-01-01 15:00:00
2  2019-01-01 22:00:00  2019-01-01 23:00:00
3  2019-01-02 01:00:00  2019-01-02 04:00:00

and I would like get the date range when df1 is not overlapped by df2:

                 START                  END
0  2019-01-01 11:00:00  2019-01-01 14:00:00
1  2019-01-01 15:00:00  2019-01-01 16:00:00
2  2019-01-01 18:00:00  2019-01-01 22:00:00
3  2019-01-01 23:00:00  2019-01-02 01:00:00

Thanks for your help.


回答1:


sympy and it's Interval could be used, here's an example

from sympy import Interval
from dateutil import parser
from datetime import datetime
import pandas as pd

def remove_overlap(time_intervals, overlapping_intervals):
    time_span = None

    for _, interval in time_intervals.iterrows():
        start_ts = parser.parse(interval["START"]).timestamp()
        end_ts = parser.parse(interval["END"]).timestamp()
        interval_ts = Interval(start_ts, end_ts)
        time_span = interval_ts + time_span if time_span else interval_ts

    for _, interval in overlapping_intervals.iterrows():
        start_ts = parser.parse(interval["START"]).timestamp()
        end_ts = parser.parse(interval["END"]).timestamp()
        interval_ts = Interval(start_ts, end_ts)
        time_span = time_span - interval_ts

    bounds_ts = list(time_span.boundary)
    bounds_dates = [datetime.fromtimestamp(t).strftime("%Y-%m-%d %H:%M:%S") for t in bounds_ts]

    df = pd.DataFrame({"START": bounds_dates[0::2], "END":bounds_dates[1::2]})
    return df

df1 = pd.DataFrame({
    'START':['2019-01-01 09:00:00', '2019-01-01 18:00:00'],
    'END':['2019-01-01 16:00:00', '2019-01-02 02:00:00']})
df2 = pd.DataFrame({
    'START':['2019-01-01 08:00:00', '2019-01-01 14:00:00', '2019-01-01 22:00:00', '2019-01-02 01:00:00'],
    'END':['2019-01-01 11:00:00', '2019-01-01 15:00:00', '2019-01-01 23:00:00', '2019-01-02 04:00:00']})

remove_overlap(df1,df2)

this outputs

START   END
0   2019-01-01 11:00:00 2019-01-01 14:00:00
1   2019-01-01 15:00:00 2019-01-01 16:00:00
2   2019-01-01 18:00:00 2019-01-01 22:00:00
3   2019-01-01 23:00:00 2019-01-02 01:00:00


来源:https://stackoverflow.com/questions/55753399/pandas-date-ranges-that-doesnt-overlap

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!