I have a table like this:
ID BEGIN END
If there are overlapping episodes for the same ID (like 2000-01-01
- 2001-1
Edit: That is great news that your DBA agreed to upgrade to a newer version of PostgreSQL. The windowing functions alone make the upgrade a worthwhile investment.
My original answer—as you note—has a major flaw: a limitation of one row per id
.
Below is a better solution without such a limitation.
I have tested it using test tables on my system (8.4).
If / when you get a moment I would like to know how it performs on your data.
I also wrote up an explanation here: https://www.mechanical-meat.com/1/detail
WITH RECURSIVE t1_rec ( id, "begin", "end", n ) AS (
SELECT id, "begin", "end", n
FROM (
SELECT
id, "begin", "end",
CASE
WHEN LEAD("begin") OVER (
PARTITION BY id
ORDER BY "begin") <= ("end" + interval '2' day)
THEN 1 ELSE 0 END AS cl,
ROW_NUMBER() OVER (
PARTITION BY id
ORDER BY "begin") AS n
FROM mytable
) s
WHERE s.cl = 1
UNION ALL
SELECT p1.id, p1."begin", p1."end", a.n
FROM t1_rec a
JOIN mytable p1 ON p1.id = a.id
AND p1."begin" > a."begin"
AND (a."begin", a."end" + interval '2' day) OVERLAPS
(p1."begin", p1."end")
)
SELECT t1.id, min(t1."begin"), max(t1."end")
FROM t1_rec t1
LEFT JOIN t1_rec t2 ON t1.id = t2.id
AND t2."end" = t1."end"
AND t2.n < t1.n
WHERE t2.n IS NULL
GROUP BY t1.id, t1.n
ORDER BY t1.id, t1.n;
Original (deprecated) answer follows;
note: limitation of one row per id
.
Denis is probably right about using lead()
and lag()
, but there is yet another way!
You can also solve this problem using so-called recursive SQL.
The overlaps function also comes in handy.
I have fully tested this solution on my system (8.4).
It works well.
WITH RECURSIVE rec_stmt ( id, begin, end ) AS (
/* seed statement:
start with only first start and end dates for each id
*/
SELECT id, MIN(begin), MIN(end)
FROM mytable seed_stmt
GROUP BY id
UNION ALL
/* iterative (not really recursive) statement:
append qualifying rows to resultset
*/
SELECT t1.id, t1.begin, t1.end
FROM rec_stmt r
JOIN mytable t1 ON t1.id = r.id
AND t1.begin > r.end
AND (r.begin, r.end + INTERVAL '1' DAY) OVERLAPS
(t1.begin - INTERVAL '1' DAY, t1.end)
)
SELECT MIN(begin), MAX(end)
FROM rec_stmt
GROUP BY id;