Why does 'HASH JOIN' or 'LOOP JOIN' improve this stored proc?

爱⌒轻易说出口 提交于 2019-12-12 08:46:50

问题


I have a basic query that goes from 6 seconds to 1 second just by changing one join from LEFT JOIN to LEFT HASH JOIN or 'LEFT LOOP JOIN'. Can anyone explain why this would cause such a large increase in performance and why SQL's optimizer isn't figuring it out on it's own?

Here is roughly what the SQL looks like:

SELECT
   a.[ID]
FROM
   [TableA] a
LEFT HASH JOIN
   [TableB] b
   ON b.[ID] = a.[TableB_ID]
JOIN
   [TableC] c
   ON c.[ID] = a.[TableC_ID]
WHERE
   a.[SomeDate] IS NULL AND
   a.[SomeStatus] IN ('X', 'Y', 'Z') AND
   c.[SomethingElse] = 'ABC'

Table A and B have millions of records and indexes on all the ID fields. Using SQL Server 2005.

Edit: A collegue suggested a LEFT LOOP JOIN and it seems to have made it even faster... SQL is not one of my strengths so I am trying to understand how these 'hints' are helping.


回答1:


HASH JOIN is useful when the large percent of rows contributes to the resultset.

In your case, building a HASH TABLE on either A or B and scanning another table is cheaper than either performing NESTED LOOPS over the index on B.ID or merging the sorted resultsets which the optimizer used before the hint.

SQL Server's optimizer did not see that: probably because you didn't gather statistics, probably because your data distribution is skewed.

Update:

Since you mentioned that LOOP JOIN improved the speed, it may be so that the JOIN order was chosen incorrectly by the optimizer.



来源:https://stackoverflow.com/questions/1395582/why-does-hash-join-or-loop-join-improve-this-stored-proc

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!