Exclude overlapping periods in time aggregate function

℡╲_俬逩灬. 提交于 2021-01-28 05:28:08

问题


I have a table containing each a start and and end date:

DROP TABLE temp_period;

CREATE TABLE public.temp_period
(
  id integer NOT NULL,
  "startDate" date,
  "endDate" date
);

INSERT INTO temp_period(id,"startDate","endDate") VALUES(1,'2010-01-01','2010-03-31');
INSERT INTO temp_period(id,"startDate","endDate") VALUES(2,'2013-05-17','2013-07-18');
INSERT INTO temp_period(id,"startDate","endDate") VALUES(3,'2010-02-15','2010-05-31');
INSERT INTO temp_period(id,"startDate","endDate") VALUES(7,'2014-01-01','2014-12-31');
INSERT INTO temp_period(id,"startDate","endDate") VALUES(56,'2014-03-31','2014-06-30');

Now I want to know the total duration of all periods stored there. I need just the time as an interval. That's pretty easy:

SELECT sum(age("endDate","startDate")) FROM temp_period;

However, the problem is: Those periods do overlap. And I want to eliminate all overlapping periods, so that I get the total amount of time which is covered by at least one record in the table.

You see, there are quite some gaps in between the times, so passing the smallest start date and the most recent end date to the age function won't do the trick. However, I thought about doing that and subtracting the total amount of gaps, but no elegant way to do that came into my mind.

I use PostgreSQL 9.6.


回答1:


What about this:

WITH
   /* get all time points where something changes */
   points AS (
       SELECT "startDate" AS p
       FROM temp_period
       UNION SELECT "endDate"
       FROM temp_period
   ),
   /*
    * Get all date ranges between these time points.
    * The first time range will start with NULL,
    * but that will be excluded in the next CTE anyway.
    */
   inter AS (
      SELECT daterange(
                lag(p) OVER (ORDER BY p),
                p
             ) i
      FROM points
   ),
   /*
    * Get all date ranges that are contained
    * in at least one of the intervals.
    */
   overlap AS (
      SELECT DISTINCT i
      FROM inter
         CROSS JOIN temp_period
      WHERE i <@ daterange("startDate", "endDate")
   )
/* sum the lengths of the date ranges */
SELECT sum(age(upper(i), lower(i)))
FROM overlap;

For your data it will return:

┌──────────┐
│ interval │
├──────────┤
│ 576 days │
└──────────┘
(1 row)



回答2:


You could try to use recursive cte to calculate the period. For each record, we will check if it's overlapped with previous records. If it is, we only calculate the period that is not overlapping.

WITH RECURSIVE days_count AS 
  ( 
         SELECT startDate, 
                endDate, 
                AGE(endDate, startDate) AS total_days, 
                rowSeq 
         FROM   ordered_data 
         WHERE  rowSeq = 1 
         UNION ALL 
         SELECT     GREATEST(curr.startDate, prev.endDate)                                            AS startDate,
                    GREATEST(curr.endDate, prev.endDate)                                              AS endDate,
                    AGE(GREATEST(curr.endDate, prev.endDate), GREATEST(curr.startDate, prev.endDate)) AS total_days,
                    curr.rowSeq 
         FROM       ordered_data curr 
         INNER JOIN days_count prev 
         ON         curr.rowSeq > 1 
         AND        curr.rowSeq = prev.rowSeq + 1), 
ordered_data AS 
  ( 
           SELECT   *, 
                    ROW_NUMBER() OVER (ORDER BY startDate) AS rowSeq 
           FROM     temp_period) 
SELECT SUM(total_days) AS total_days
FROM   days_count;

I've created a demo here




回答3:


This one required two outer joins on a complex query. One join to identify all overlaps with a startdate larger than THIS and to expand the timespan to match the larger of the two. The second join is needed to match records with no overlaps. Take the Min of the min and the max of the max, including non matched. I was using MSSQL so the syntax may be a bit different.

DECLARE @temp_period TABLE
(
  id int NOT NULL,
  startDate datetime,
  endDate datetime
)

INSERT INTO @temp_period(id,startDate,endDate) VALUES(1,'2010-01-01','2010-03-31')
INSERT INTO @temp_period(id,startDate,endDate) VALUES(2,'2013-05-17','2013-07-18')
INSERT INTO @temp_period(id,startDate,endDate) VALUES(3,'2010-02-15','2010-05-31')
INSERT INTO @temp_period(id,startDate,endDate) VALUES(3,'2010-02-15','2010-07-31')
INSERT INTO @temp_period(id,startDate,endDate) VALUES(7,'2014-01-01','2014-12-31')
INSERT INTO @temp_period(id,startDate,endDate) VALUES(56,'2014-03-31','2014-06-30')


;WITH OverLaps AS
(
    SELECT 
        Main.id,
        OverlappedID=Overlaps.id,
        OverlapMinDate,
        OverlapMaxDate
    FROM
        @temp_period Main
        LEFT OUTER JOIN
        (
            SELECT 
                This.id,
                OverlapMinDate=CASE WHEN This.StartDate<Prior.StartDate THEN This.StartDate ELSE Prior.StartDate END,
                OverlapMaxDate=CASE WHEN This.EndDate>Prior.EndDate THEN This.EndDate ELSE Prior.EndDate END,
                PriorID=Prior.id
            FROM
                @temp_period This
                LEFT OUTER JOIN @temp_period Prior ON Prior.endDate > This.startDate AND Prior.startdate < this.endDate AND This.Id<>Prior.ID
        ) Overlaps ON Main.Id=Overlaps.PriorId
)

SELECT
    T.Id,
    --If has overlapped then sum all overlapped records prior to this one, else not and overlap get the start and end
    MinDate=MIN(COALESCE(HasOverlapped.OverlapMinDate,startDate)),
    MaxDate=MAX(COALESCE(HasOverlapped.OverlapMaxDate,endDate))
FROM
    @temp_period T
    LEFT OUTER JOIN OverLaps IsAOverlap ON IsAOverlap.OverlappedID=T.id
    LEFT OUTER JOIN OverLaps HasOverlapped ON HasOverlapped.Id=T.id
WHERE
    IsAOverlap.OverlappedID IS NULL -- Exclude older records that have overlaps
GROUP BY
    T.Id


来源:https://stackoverflow.com/questions/44134635/exclude-overlapping-periods-in-time-aggregate-function

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!