问题
Assume you have (in Postgres 9.1 ) a table like this:
date | value
which have some gaps in it (I mean: not every possible date between min(date) and max(date) has it's row).
My problem is how to aggregate this data so that each consistent group (without gaps) is treated separately, like this:
min_date | max_date | [some aggregate of "value" column]
Any ideas how to do it? I believe it is possible with window functions but after a while trying with lag()
and lead()
I'm a little stuck.
For instance if the data are like this:
date | value
---------------+-------
2011-10-31 | 2
2011-11-01 | 8
2011-11-02 | 10
2012-09-13 | 1
2012-09-14 | 4
2012-09-15 | 5
2012-09-16 | 20
2012-10-30 | 10
the output (for sum
as the aggregate) would be:
min | max | sum
-----------+------------+-------
2011-10-31 | 2011-11-02 | 20
2012-09-13 | 2012-09-16 | 30
2012-10-30 | 2012-10-30 | 10
回答1:
create table t ("date" date, "value" int);
insert into t ("date", "value") values
('2011-10-31', 2),
('2011-11-01', 8),
('2011-11-02', 10),
('2012-09-13', 1),
('2012-09-14', 4),
('2012-09-15', 5),
('2012-09-16', 20),
('2012-10-30', 10);
Simpler and cheaper version:
select min("date"), max("date"), sum(value)
from (
select
"date", value,
"date" - (dense_rank() over(order by "date"))::int g
from t
) s
group by s.g
order by 1
My first try was more complex and expensive:
create temporary sequence s;
select min("date"), max("date"), sum(value)
from (
select
"date", value, d,
case
when lag("date", 1, null) over(order by s.d) is null and "date" is not null
then nextval('s')
when lag("date", 1, null) over(order by s.d) is not null and "date" is not null
then lastval()
else 0
end g
from
t
right join
generate_series(
(select min("date") from t)::date,
(select max("date") from t)::date + 1,
'1 day'
) s(d) on s.d::date = t."date"
) q
where g != 0
group by g
order by 1
;
drop sequence s;
The output:
min | max | sum
------------+------------+-----
2011-10-31 | 2011-11-02 | 20
2012-09-13 | 2012-09-16 | 30
2012-10-30 | 2012-10-30 | 10
(3 rows)
回答2:
Here is a way of solving it.
First, to get the beginning of consecutive series, this query would give you the first date:
SELECT first.date
FROM raw_data first
LEFT OUTER JOIN raw_data prior_first ON first.date = prior_first + 1
WHERE prior_first IS NULL
likewise for the end of consecutive series,
SELECT last.date
FROM raw_data last
LEFT OUTER JOIN raw_data after_last ON last.date = after_last - 1
WHERE after_last IS NULL
You might consider making these views, to simplify queries using them.
We only need the first to form group ranges
CREATE VIEW beginings AS
SELECT first.date
FROM raw_data first
LEFT OUTER JOIN raw_data prior_first ON first.date = prior_first + 1
WHERE prior_first IS NULL
CREATE VIEW endings AS
SELECT last.date
FROM raw_data last
LEFT OUTER JOIN raw_data after_last ON last.date = after_last - 1
WHERE after_last IS NULL
SELECT MIN(raw.date), MAX(raw.date), SUM(raw.value)
FROM raw_data raw
INNER JOIN (SELECT lo.date AS lo_date, MIN(hi.date) as hi_date
FROM beginnings lo, endings hi
WHERE lo.date < hi.date
GROUP BY lo.date) range
ON raw.date >= range.lo_date AND raw.date <= range.hi_date
GROUP BY range.lo_date
来源:https://stackoverflow.com/questions/13009893/group-by-consecutive-dates-delimited-by-gaps