PostgreSQL: NOT IN versus EXCEPT performance difference (edited #2)

前端 未结 5 536
萌比男神i
萌比男神i 2020-12-15 04:01

I have two queries that are functionally identical. One of them performs very well, the other one performs very poorly. I do not see from where the performance difference ar

相关标签:
5条回答
  • 2020-12-15 04:40

    If id and position_id are both indexed (either on their own or first column in a multi-column index), then two index scans are all that are necessary - it's a trivial sorted-merge based set algorithm.

    Personally I think PostgreSQL simply doesn't have the optimization intelligence to understand this.

    (I came to this question after diagnosing a query running for over 24 hours that I could perform with sort x y y | uniq -u on the command line in seconds. Database less than 50MB when exported with pg_dump.)

    PS: more interesting comment here:

    more work has been put into optimizing EXCEPT and NOT EXISTS than NOT IN, because the latter is substantially less useful due to its unintuitive but spec-mandated handling of NULLs. We're not going to apologize for that, and we're not going to regard it as a bug.

    What it comes down to is that except is different to not in with respect to null handling. I haven't looked up the details, but it means PostgreSQL (aggressively) doesn't optimize it.

    0 讨论(0)
  • 2020-12-15 04:41

    Since you are running with the default configuration, try bumping up work_mem. Most likely, the subquery ends up getting spooled to disk because you only allow for 1Mb of work memory. Try 10 or 20mb.

    0 讨论(0)
  • 2020-12-15 04:44

    The second query makes usage of the HASH JOIN feature of postgresql. This is much faster then the Seq Scan of the first one.

    0 讨论(0)
  • 2020-12-15 04:50

    Query #1 is not the elegant way for doing this... (NOT) IN SELECT is fine for a few entries, but it can't use indexes (Seq Scan).

    Before having EXCEPT... this is how it was done using a JOIN (HASH JOIN):

        SELECT sp.id
        FROM subsource_position AS sp
            LEFT JOIN subsource AS s ON (s.postion_id = sp.id)
        WHERE
            s.postion_id IS NULL
    

    EXCEPT appeared in Postgres long, long time ago... But for exemple, using MySQL I believe this is still the only way to achieve this using index junctions.

    0 讨论(0)
  • 2020-12-15 04:59

    Your queries are not functionally equivalent so any comparison of their query plans is meaningless.

    Your first query is, in set theory terms, this:

    {subsource.position_id} - {subsource_position.id}
              ^        ^                ^        ^
    

    but your second is this:

    {subsource_position.id} - {subsource.position_id}
              ^        ^                ^        ^
    

    And A - B is not the same as B - A for arbitrary sets A and B.

    Fix your queries to be semantically equivalent and try again.

    0 讨论(0)
提交回复
热议问题