问题
I have a table with id
, year
and count
.
I want to get the MAX(count)
for each id
and keep the year
when it happens, so I make this query:
SELECT id, year, MAX(count)
FROM table
GROUP BY id;
Unfortunately, it gives me an error:
ERROR: column "table.year" must appear in the GROUP BY clause or be used in an aggregate function
So I try:
SELECT id, year, MAX(count)
FROM table
GROUP BY id, year;
But then, it doesn't do MAX(count)
, it just shows the table as it is. I suppose because when grouping by year
and id
, it gets the max for the id
of that specific year.
So, how can I write that query? I want to get the id
´s MAX(count)
and the year when that happens.
回答1:
select *
from (
select id,
year,
thing,
max(thing) over (partition by id) as max_thing
from the_table
) t
where thing = max_thing
or:
select t1.id,
t1.year,
t1.thing
from the_table t1
where t1.thing = (select max(t2.thing)
from the_table t2
where t2.id = t1.id);
or
select t1.id,
t1.year,
t1.thing
from the_table t1
join (
select id, max(t2.thing) as max_thing
from the_table t2
group by id
) t on t.id = t1.id and t.max_thing = t1.thing
or (same as the previous with a different notation)
with max_stuff as (
select id, max(t2.thing) as max_thing
from the_table t2
group by id
)
select t1.id,
t1.year,
t1.thing
from the_table t1
join max_stuff t2
on t1.id = t2.id
and t1.thing = t2.max_thing
回答2:
The shortest (and possibly fastest) query would be with DISTINCT ON, a PostgreSQL extension of the SQL standard DISTINCT
clause:
SELECT DISTINCT ON (1)
id, count, year
FROM tbl
ORDER BY 1, 2 DESC, 3;
The numbers refer to ordinal positions in the SELECT
list. You can spell out column names for clarity:
SELECT DISTINCT ON (id)
id, count, year
FROM tbl
ORDER BY id, count DESC, year;
The result is ordered by id
, which may or may not be welcome. It's better than "undefined" in any case.
It also breaks ties (when multiple years share the same maximum count) in a well defined way: pick the earliest year. If you don't care, drop year
from the ORDER BY
. Or pick the latest year with year DESC
.
More explanation, links, a benchmark and possibly faster solutions in this closely related answer:
- Select first row in each GROUP BY group?
Aside: In a real life query, you wouldn't use some of the column names. id
is a non-descriptive anti-pattern for a column name, count
is a reserved word in standard SQL and an aggregate function in Postgres.
来源:https://stackoverflow.com/questions/13325583/postgresql-max-and-group-by