select *
from records
where id in ( select max(id) from records group by option_id )
This query works fine even on millions of rows. However as y
Assuming relatively few rows in options for many rows in records.
Typically, you would have a look-up table options that is referenced from records.option_id, ideally with a foreign key constraint. If you don't, I suggest to create one to enforce referential integrity:
CREATE TABLE options (
option_id int PRIMARY KEY
, option text UNIQUE NOT NULL
);
INSERT INTO options
SELECT DISTINCT option_id, 'option' || option_id -- dummy option names
FROM records;
Then there is no need to emulate a loose index scan any more and this becomes very simple and fast. Correlated subqueries can use a plain index on (option_id, id).
SELECT option_id, (SELECT max(id)
FROM records
WHERE option_id = o.option_id) AS max_id
FROM options o
ORDER BY 1;
This includes options with no match in table records. You get NULL for max_id and you can easily remove such rows in an outer SELECT if needed.
Or (same result):
SELECT option_id, (SELECT id
FROM records
WHERE option_id = o.option_id
ORDER BY id DESC NULLS LAST
LIMIT 1) AS max_id
FROM options o
ORDER BY 1;
May be slightly faster. The subquery uses the sort order DESC NULLS LAST - same as the aggregate function max() which ignores NULL values. Sorting just DESC would have NULL first:
The perfect index for this:
CREATE INDEX on records (option_id, id DESC NULLS LAST);
Index sort order doesn't matter much while columns are defined NOT NULL.
There can still be a sequential scan on the small table options, that's just the fastest way to fetch all rows. The ORDER BY may bring in an index (only) scan to fetch pre-sorted rows.
The big table records is only accessed via (bitmap) index scan or, if possible, index-only scan.
db<>fiddle here - showing two index-only scans for the simple case
Old sqlfiddle
Or use LATERAL joins for a similar effect in Postgres 9.3+: