Time interval overlaps - teradata

问题

I need help with interval overplaps. I have these records in one table (and much more):

Example 1:

Id---------StartDate------EndDate

794122    2011-05-10    2999-12-31

794122    2011-04-15    2999-12-31

794122    2008-04-03    2999-12-31

794122    2008-03-31    2999-12-31

794122    2008-02-29    2999-12-31

794122    2008-02-04    2999-12-31

794122    2007-10-10    2999-12-31

794122    2007-09-15    2999-12-31

Example 2:

Id---------StartDate------EndDate

5448    2012-12-28      2999-12-31

5448    2011-06-30      2999-12-31

5448    2005-12-26      2011-06-30

5448    2005-06-15      2011-06-30

5448    2006-07-31      2006-12-31

5448    2001-03-31      2006-07-15

Example 3:

Id---------StartDate------EndDate

214577    2007-02-28    2999-12-31

214577    2003-06-20    2007-03-04

214577    2003-06-20    2007-02-28

Example 4:

Id---------StartDate-------EndDate

9999    2008-05-28      2999-01-01

9999    2005-03-03      2008-05-31

9999    2005-05-31      2005-12-31

9999    2003-12-01      2005-08-12

9999    2001-01-01      2002-03-05

9999    2000-01-08      2002-01-01

I would like to get:

*Example1* - 2007-09-15->3000-01-01

*Example2* - 2001-03-31->3000-01-01

*Example3* - 2003-06-20->3000-01-01

*Example4* - 2003-12-01->3000-01-01

Have you any suggestions how I do it? Because i dont choose max and min values(group by Id) -> This problem is in the example 4.

Thanks!

回答1:

The result for example #4 doesn't match your data, shouldn't this be 9999, 2999-01-02 instead of 3000-01-01?

A typical solution for combining overlapping periods uses nested OLAP-functions, for your specific requirement (only the latest period) it can be a bit simplified to:

SELECT *
FROM
 (
   SELECT DISTINCT -- DISTINCT is not neccessary, but results in a better plan
      Id,
      StartDate,
      MAX(EndDate) 
      OVER (PARTITION BY Id) + 1 AS EndDate
   FROM dropme AS t
   QUALIFY -- find the gap
      COALESCE(StartDate 
               - MAX(EndDate) 
                 OVER (PARTITION BY Id
                       ORDER BY StartDate, EndDate
                       ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING), 1) > 0
 ) AS dt
QUALIFY 
   ROW_NUMBER() 
   OVER (PARTITION BY Id
         ORDER BY StartDate DESC) = 1
;

回答2:

You want the end date to be the first day of the following year?

select id, min(startdate) start_date, 
       cast(max(extract(year from enddate)) + 1 || '-01-01' as date) end_date
from table1
group by id

回答3:

Are you just trying to do this?

select id, min(start_date) as start_date, max(end_date) as end_date
from t
group by id;

EDIT:

Now that I understand what you need. It identifies the rows that start a new period (using the not exists clause to look for overlaps). It then chooses the maximum start_date among those rows for each id:

select t.id, min(t.start_date) as start_date, max(t.end_date) as end_date
from (select id, max(start_date) as maxsd
      from t
      where not exists (select 1
                        from t t2
                        where t2.start_date < t.start_date and
                              t2.end_date >= t.start_date
                       )
      group by id
     ) ids join
     t
     on t.id = ids.id and
        t.start_date >= maxsd
group by t.id;

The final step joins back to the original data and does the aggregation on anything that starts after the start date.

来源：https://stackoverflow.com/questions/17946389/time-interval-overlaps-teradata

标签

sql

teradata