Optimize groupwise maximum query

前端未结

关注

 4  737

温柔的废话 2020-11-30 14:21

select * 
from records 
where id in ( select max(id) from records group by option_id )

This query works fine even on millions of rows. However as y

4条回答

醉梦人生 (楼主)

2020-11-30 14:31
Assuming relatively few rows in options for many rows in records.

Typically, you would have a look-up table options that is referenced from records.option_id, ideally with a foreign key constraint. If you don't, I suggest to create one to enforce referential integrity:
```
CREATE TABLE options (
  option_id int  PRIMARY KEY
, option    text UNIQUE NOT NULL
);

INSERT INTO options
SELECT DISTINCT option_id, 'option' || option_id -- dummy option names
FROM   records;
```
Then there is no need to emulate a loose index scan any more and this becomes very simple and fast. Correlated subqueries can use a plain index on (option_id, id).
```
SELECT option_id, (SELECT max(id)
                   FROM   records
                   WHERE  option_id = o.option_id) AS max_id
FROM   options o
ORDER  BY 1;
```
This includes options with no match in table records. You get NULL for max_id and you can easily remove such rows in an outer SELECT if needed.

Or (same result):
```
SELECT option_id, (SELECT id
                   FROM   records
                   WHERE  option_id = o.option_id
                   ORDER  BY id DESC NULLS LAST
                   LIMIT  1) AS max_id
FROM   options o
ORDER  BY 1;
```
May be slightly faster. The subquery uses the sort order DESC NULLS LAST - same as the aggregate function max() which ignores NULL values. Sorting just DESC would have NULL first:
- Why do NULL values come first when ordering DESC in a PostgreSQL query?
The perfect index for this:
```
CREATE INDEX on records (option_id, id DESC NULLS LAST);
```
Index sort order doesn't matter much while columns are defined NOT NULL.

There can still be a sequential scan on the small table options, that's just the fastest way to fetch all rows. The ORDER BY may bring in an index (only) scan to fetch pre-sorted rows.
The big table records is only accessed via (bitmap) index scan or, if possible, index-only scan.

db<>fiddle here - showing two index-only scans for the simple case
_{Old sqlfiddle}

Or use LATERAL joins for a similar effect in Postgres 9.3+:
- Optimize GROUP BY query to retrieve latest row per user
0 讨论(0)

查看其它4个回答
发布评论:

提交评论
- 加载中...