How does the optimizer decide between merge join and hash join?

问题

Database System Concepts introduce several ways to implement a join operation. Two of them are merge join and hash join.

I was wondering when the optimizer decides to use a merge join and when a hash join?
In particular, from https://stackoverflow.com/a/1114288/156458

hash joins can only be used for equi-joins, but merge joins are more flexible.

But Database System Concepts says both are used only for equi joins and natural joins.

The merge-join algorithm (also called the sort-merge-join algorithm) can be used to compute natural joins and equi-joins.

...

Like the merge-join algorithm, the hash-join algorithm can be used to implement natural joins and equi-joins.

Thanks.

My question comes from PostgreSQL document, where there are two examples, and I am not sure why one uses merge join, and the other hash join:

EXPLAIN SELECT *
FROM tenk1 t1, tenk2 t2
WHERE t1.unique1 < 100 AND t1.unique2 = t2.unique2;
                                        QUERY PLAN
------------------------------------------------------------------------------------------
 Hash Join  (cost=230.47..713.98 rows=101 width=488)
   Hash Cond: (t2.unique2 = t1.unique2)
   ->  Seq Scan on tenk2 t2  (cost=0.00..445.00 rows=10000 width=244)
   ->  Hash  (cost=229.20..229.20 rows=101 width=244)
         ->  Bitmap Heap Scan on tenk1 t1  (cost=5.07..229.20 rows=101 width=244)
               Recheck Cond: (unique1 < 100)
               ->  Bitmap Index Scan on tenk1_unique1 
 (cost=0.00..5.04 rows=101 width=0)
                     Index Cond: (unique1 < 100)

and

EXPLAIN SELECT *
FROM tenk1 t1, onek t2
WHERE t1.unique1 < 100 AND t1.unique2 = t2.unique2;
                                        QUERY PLAN
------------------------------------------------------------------------------------------
 Merge Join  (cost=198.11..268.19 rows=10 width=488)
   Merge Cond: (t1.unique2 = t2.unique2)
   ->  Index Scan using tenk1_unique2 on tenk1 t1  (cost=0.29..656.28 rows=101 width=244)
         Filter: (unique1 < 100)
   ->  Sort  (cost=197.83..200.33 rows=1000 width=244)
         Sort Key: t2.unique2
         ->  Seq Scan on onek t2  (cost=0.00..148.00 rows=1000 width=244)

来源：https://stackoverflow.com/questions/50987379/how-does-the-optimizer-decide-between-merge-join-and-hash-join

标签

postgresql

join

relational-database

query-optimization