PostgreSQL: NOT IN versus EXCEPT performance difference (edited #2)

前端未结

关注

 5  536

I have two queries that are functionally identical. One of them performs very well, the other one performs very poorly. I do not see from where the performance difference ar

相关标签:

5条回答

南方客

2020-12-15 04:40

If id and position_id are both indexed (either on their own or first column in a multi-column index), then two index scans are all that are necessary - it's a trivial sorted-merge based set algorithm.

Personally I think PostgreSQL simply doesn't have the optimization intelligence to understand this.

(I came to this question after diagnosing a query running for over 24 hours that I could perform with sort x y y | uniq -u on the command line in seconds. Database less than 50MB when exported with pg_dump.)

PS: more interesting comment here:

more work has been put into optimizing EXCEPT and NOT EXISTS than NOT IN, because the latter is substantially less useful due to its unintuitive but spec-mandated handling of NULLs. We're not going to apologize for that, and we're not going to regard it as a bug.

What it comes down to is that except is different to not in with respect to null handling. I haven't looked up the details, but it means PostgreSQL (aggressively) doesn't optimize it.

0 讨论(0)
发布评论:

提交评论
- 加载中...
故里飘歌

2020-12-15 04:41

Since you are running with the default configuration, try bumping up work_mem. Most likely, the subquery ends up getting spooled to disk because you only allow for 1Mb of work memory. Try 10 or 20mb.

0 讨论(0)
发布评论:

提交评论
- 加载中...
清歌不尽

2020-12-15 04:44

The second query makes usage of the HASH JOIN feature of postgresql. This is much faster then the Seq Scan of the first one.

0 讨论(0)
发布评论:

提交评论
- 加载中...
名媛妹妹

2020-12-15 04:50
Query #1 is not the elegant way for doing this... (NOT) IN SELECT is fine for a few entries, but it can't use indexes (Seq Scan).

Before having EXCEPT... this is how it was done using a JOIN (HASH JOIN):
```
    SELECT sp.id
    FROM subsource_position AS sp
        LEFT JOIN subsource AS s ON (s.postion_id = sp.id)
    WHERE
        s.postion_id IS NULL
```
EXCEPT appeared in Postgres long, long time ago... But for exemple, using MySQL I believe this is still the only way to achieve this using index junctions.
0 讨论(0)
发布评论:

提交评论
- 加载中...
广开言路

2020-12-15 04:59
Your queries are not functionally equivalent so any comparison of their query plans is meaningless.

Your first query is, in set theory terms, this:
```
{subsource.position_id} - {subsource_position.id}
          ^        ^                ^        ^
```
but your second is this:
```
{subsource_position.id} - {subsource.position_id}
          ^        ^                ^        ^
```
And A - B is not the same as B - A for arbitrary sets A and B.

Fix your queries to be semantically equivalent and try again.
0 讨论(0)
发布评论:

提交评论
- 加载中...