Why does 'HASH JOIN' or 'LOOP JOIN' improve this stored proc?

问题

I have a basic query that goes from 6 seconds to 1 second just by changing one join from LEFT JOIN to LEFT HASH JOIN or 'LEFT LOOP JOIN'. Can anyone explain why this would cause such a large increase in performance and why SQL's optimizer isn't figuring it out on it's own?

Here is roughly what the SQL looks like:

SELECT
   a.[ID]
FROM
   [TableA] a
LEFT HASH JOIN
   [TableB] b
   ON b.[ID] = a.[TableB_ID]
JOIN
   [TableC] c
   ON c.[ID] = a.[TableC_ID]
WHERE
   a.[SomeDate] IS NULL AND
   a.[SomeStatus] IN ('X', 'Y', 'Z') AND
   c.[SomethingElse] = 'ABC'

Table A and B have millions of records and indexes on all the ID fields. Using SQL Server 2005.

Edit: A collegue suggested a LEFT LOOP JOIN and it seems to have made it even faster... SQL is not one of my strengths so I am trying to understand how these 'hints' are helping.

回答1:

HASH JOIN is useful when the large percent of rows contributes to the resultset.

In your case, building a HASH TABLE on either A or B and scanning another table is cheaper than either performing NESTED LOOPS over the index on B.ID or merging the sorted resultsets which the optimizer used before the hint.

SQL Server's optimizer did not see that: probably because you didn't gather statistics, probably because your data distribution is skewed.

Update:

Since you mentioned that LOOP JOIN improved the speed, it may be so that the JOIN order was chosen incorrectly by the optimizer.

来源：https://stackoverflow.com/questions/1395582/why-does-hash-join-or-loop-join-improve-this-stored-proc

标签

sql