How does a hash full outer join work?

问题

I know the algorithm for a hash left outer join is to build a hashtable on the right table and then loop through the left table and search in the hashtable for if there is a match, but how does a full outer join work? After you scan through the values in the left table you would still need a way to get the tuples in the right table that didn't have matches in the left.

回答1:

While looping through the probe records you record which right tuples have found a match in the build table. You just set a boolean to true for each one that matched. As a final pass in the algorithm you scan the build table and output all tuples that did not match previously.

There is an alternate strategy which is not used in RDBMS's as far as I'm aware: Build a combined hash table of left and right tuples. Treat that table as a map from hash key to a list of left tuples plus a list of right tuples. Build that table by looping through both input tables adding all tuples to the hash table. After all tuples have been consumed iterate over the hash table once and output the equality groups accordingly (either all left-tuples or all right-tuples or a cross-product of all left and all right tuples in the equality group).

The latter algorithm is nice for in-memory workloads (like in client applications). The former is good for an extremely (or unpredictably) large probe input so RDBMS's use that one.

来源：https://stackoverflow.com/questions/13436236/how-does-a-hash-full-outer-join-work

标签

database

join

hash

outer-join