I have two queries that are functionally identical. One of them performs very well, the other one performs very poorly. I do not see from where the performance difference ar
If id
and position_id
are both indexed (either on their own or first column in a multi-column index), then two index scans are all that are necessary - it's a trivial sorted-merge based set algorithm.
Personally I think PostgreSQL simply doesn't have the optimization intelligence to understand this.
(I came to this question after diagnosing a query running for over 24 hours that I could perform with sort x y y | uniq -u
on the command line in seconds. Database less than 50MB when exported with pg_dump.)
PS: more interesting comment here:
more work has been put into optimizing EXCEPT and NOT EXISTS than NOT IN, because the latter is substantially less useful due to its unintuitive but spec-mandated handling of NULLs. We're not going to apologize for that, and we're not going to regard it as a bug.
What it comes down to is that except
is different to not in
with respect to null handling. I haven't looked up the details, but it means PostgreSQL (aggressively) doesn't optimize it.