Why can't PostgreSQL do this simple FULL JOIN?

你。 提交于 2021-01-22 06:36:34

问题


Here's a minimal setup with 2 tables a and b each with 3 rows:

CREATE TABLE a (
    id SERIAL PRIMARY KEY,
    value TEXT
);
CREATE INDEX ON a (value);

CREATE TABLE b (
    id SERIAL PRIMARY KEY,
    value TEXT
);
CREATE INDEX ON b (value);

INSERT INTO a (value) VALUES ('x'), ('y'),        (NULL);
INSERT INTO b (value) VALUES        ('y'), ('z'), (NULL);

Here is a LEFT JOIN that works fine as expected:

SELECT * FROM a
LEFT JOIN b ON a.value IS NOT DISTINCT FROM b.value;

with output:

 id | value | id | value 
----+-------+----+-------
  1 | x     |    | 
  2 | y     |  1 | y
  3 |       |  3 | 
(3 rows)

Changing "LEFT JOIN" to "FULL JOIN" gives an error:

SELECT * FROM a
FULL JOIN b ON a.value IS NOT DISTINCT FROM b.value;

ERROR: FULL JOIN is only supported with merge-joinable or hash-joinable join conditions

Can someone please answer:

What is a "merge-joinable or hash-joinable join condition" and why joining on a.value IS NOT DISTINCT FROM b.value doesn't fulfill this condition, but a.value = b.value is perfectly fine?

It seems that the only difference is how NULL values are handled. Since the value column is indexed in both tables, running an EXPLAIN on a NULL lookup is just as efficient as looking up values that are non-NULL:

EXPLAIN SELECT * FROM a WHERE value = 'x';
                                QUERY PLAN                                
--------------------------------------------------------------------------
 Bitmap Heap Scan on a  (cost=4.20..13.67 rows=6 width=36)
   Recheck Cond: (value = 'x'::text)
   ->  Bitmap Index Scan on a_value_idx  (cost=0.00..4.20 rows=6 width=0)
         Index Cond: (value = 'x'::text)


EXPLAIN SELECT * FROM a WHERE value ISNULL;
                                QUERY PLAN                                
--------------------------------------------------------------------------
 Bitmap Heap Scan on a  (cost=4.20..13.65 rows=6 width=36)
   Recheck Cond: (value IS NULL)
   ->  Bitmap Index Scan on a_value_idx  (cost=0.00..4.20 rows=6 width=0)
         Index Cond: (value IS NULL)

This has been tested with PostgreSQL 9.6.3 and 10beta1.

There has been discussion about this issue, but it doesn't directly answer the above question.


回答1:


PostgreSQL implements FULL OUTER JOIN with either a hash or a merge join.

To be eligible for such a join, the join condition has to have the form

<expression using only left table> <operator> <expression using only right table>

Now your join condition does look like this, but PostgreSQL does not have a special IS NOT DISTINCT FROM operator, so it parses your condition into:

(NOT ($1 IS DISTINCT FROM $2))

And such an expression cannot be used for hash or merge joins, hence the error message.

I can think of a way to work around it:

SELECT a_id, NULLIF(a_value, '<null>'),
       b_id, NULLIF(b_value, '<null>')
FROM (SELECT id AS a_id,
             COALESCE(value, '<null>') AS a_value
      FROM a
     ) x
   FULL JOIN
     (SELECT id AS b_id,
             COALESCE(value, '<null>') AS b_value
      FROM b
     ) y
      ON x.a_value = y.b_value;

That works if <null> does not appear anywhere in the value columns.




回答2:


I just solved such a case by replacing the ON condition with "TRUE", and moving the original "ON" condition into a WHERE clause. I don't know the performance impact of this, though.



来源:https://stackoverflow.com/questions/44231558/why-cant-postgresql-do-this-simple-full-join

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!