I have a table core_message
in Postgres, with millions of rows that looks like this (simplified):
┌────────────────┬──
You have put existing answers to good use and came up with great solutions in your own answer. Some missing pieces:
I'm still trying to understand how to properly use his first
RECURSIVE
solution ...
You used this query to create the test_boats
table with unique mmsi
:
select distinct on (mmsi) mmsi from core_message
For many rows per boat (mmsi
), use this faster RECURSIVE
solution instead:
WITH RECURSIVE cte AS (
(
SELECT mmsi
FROM core_message
ORDER BY mmsi
LIMIT 1
)
UNION ALL
SELECT m.*
FROM cte c
CROSS JOIN LATERAL (
SELECT mmsi
FROM core_message
WHERE mmsi > c.mmsi
ORDER BY mmsi
LIMIT 1
) m
)
TABLE cte;
This hardly gets any slower with more rows per boat, as opposed to DISTINCT ON
which is typically faster with only few rows per boat. Each only needs an index with mmsi
as leading column to be fast.
If possible, create that boats
table and add a FK constraint to it. (Means you have to maintain it.) Then you can go on using the optimal LATERAL
query you have in your answer and never miss any boats. (Orphaned boats may be worth tracking / removing in the long run.)
Else, another iteration of that RECURSIVE
query is the next best thing to get whole rows for the latest position of each boat quickly:
WITH RECURSIVE cte AS (
(
SELECT *
FROM core_message
ORDER BY mmsi DESC, time DESC -- see below
LIMIT 1
)
UNION ALL
SELECT m.*
FROM cte c
CROSS JOIN LATERAL (
SELECT *
FROM core_message
WHERE mmsi < c.mmsi
ORDER BY mmsi DESC, time DESC
LIMIT 1
) m
)
TABLE cte;
You have both of these indexes:
"core_message_uniq_mmsi_time" UNIQUE CONSTRAINT, btree (mmsi, "time")
"core_messag_mmsi_b36d69_idx" btree (mmsi, "time" DESC)
A UNIQUE
constraint is implemented with all columns in default ASC
sort order. That cannot be changed. If you don't actually need the constraint, you might replace it with a UNIQUE
index, mostly achieving the same. But there you can add any sort order you like. Related:
But there is no need for the use case at hand. Postgres can scan a b-tree index backwards at practically the same speed. And I see nothing here that would require inverted sort order for the two columns. The additional index core_messag_mmsi_b36d69_idx
is expensive dead freight - unless you have other use cases that actually need it. See:
To best use the index core_message_uniq_mmsi_time
from the UNIQUE
constraint I step through both columns in descending order. That matters.