问题
I need help with interval overplaps. I have these records in one table (and much more):
Example 1:
Id---------StartDate------EndDate
794122 2011-05-10 2999-12-31
794122 2011-04-15 2999-12-31
794122 2008-04-03 2999-12-31
794122 2008-03-31 2999-12-31
794122 2008-02-29 2999-12-31
794122 2008-02-04 2999-12-31
794122 2007-10-10 2999-12-31
794122 2007-09-15 2999-12-31
Example 2:
Id---------StartDate------EndDate
5448 2012-12-28 2999-12-31
5448 2011-06-30 2999-12-31
5448 2005-12-26 2011-06-30
5448 2005-06-15 2011-06-30
5448 2006-07-31 2006-12-31
5448 2001-03-31 2006-07-15
Example 3:
Id---------StartDate------EndDate
214577 2007-02-28 2999-12-31
214577 2003-06-20 2007-03-04
214577 2003-06-20 2007-02-28
Example 4:
Id---------StartDate-------EndDate
9999 2008-05-28 2999-01-01
9999 2005-03-03 2008-05-31
9999 2005-05-31 2005-12-31
9999 2003-12-01 2005-08-12
9999 2001-01-01 2002-03-05
9999 2000-01-08 2002-01-01
I would like to get:
*Example1* - 2007-09-15->3000-01-01
*Example2* - 2001-03-31->3000-01-01
*Example3* - 2003-06-20->3000-01-01
*Example4* - 2003-12-01->3000-01-01
Have you any suggestions how I do it? Because i dont choose max and min values(group by Id) -> This problem is in the example 4.
Thanks!
回答1:
The result for example #4 doesn't match your data, shouldn't this be 9999, 2999-01-02 instead of 3000-01-01?
A typical solution for combining overlapping periods uses nested OLAP-functions, for your specific requirement (only the latest period) it can be a bit simplified to:
SELECT *
FROM
(
SELECT DISTINCT -- DISTINCT is not neccessary, but results in a better plan
Id,
StartDate,
MAX(EndDate)
OVER (PARTITION BY Id) + 1 AS EndDate
FROM dropme AS t
QUALIFY -- find the gap
COALESCE(StartDate
- MAX(EndDate)
OVER (PARTITION BY Id
ORDER BY StartDate, EndDate
ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING), 1) > 0
) AS dt
QUALIFY
ROW_NUMBER()
OVER (PARTITION BY Id
ORDER BY StartDate DESC) = 1
;
回答2:
You want the end date to be the first day of the following year?
select id, min(startdate) start_date,
cast(max(extract(year from enddate)) + 1 || '-01-01' as date) end_date
from table1
group by id
回答3:
Are you just trying to do this?
select id, min(start_date) as start_date, max(end_date) as end_date
from t
group by id;
EDIT:
Now that I understand what you need. It identifies the rows that start a new period (using the not exists
clause to look for overlaps). It then chooses the maximum start_date
among those rows for each id:
select t.id, min(t.start_date) as start_date, max(t.end_date) as end_date
from (select id, max(start_date) as maxsd
from t
where not exists (select 1
from t t2
where t2.start_date < t.start_date and
t2.end_date >= t.start_date
)
group by id
) ids join
t
on t.id = ids.id and
t.start_date >= maxsd
group by t.id;
The final step joins back to the original data and does the aggregation on anything that starts after the start date.
来源:https://stackoverflow.com/questions/17946389/time-interval-overlaps-teradata