T-SQL MERGE Performance in typical publishing context

廉价感情. 提交于 2019-12-01 01:34:04
Martin Smith

Subtree costs should be taken with a large grain of salt (and especially so when you have huge cardinality errors). SET STATISTICS IO ON; SET STATISTICS TIME ON; output is a better indicator of actual performance.

The zero row sort doesn't take 87% of the resources. This problem in your plan is one of statistics estimation. The costs shown in the actual plan are still estimated costs. It doesn't adjust them to take account of what actually happened.

There is a point in the plan where a filter reduces 1,911,721 rows to 0 but the estimated rows going forward are 1,860,310. Thereafter all costs are bogus culminating in the 87% cost estimated 3,348,560 row sort.

The cardinality estimation error can be reproduced outside the Merge statement by looking at the estimated plan for the Full Outer Join with equivalent predicates (gives same 1,860,310 row estimate).

SELECT * 
FROM TargetTable T
FULL OUTER JOIN  @tSource S
    ON S.Key1 = T.Key1 and S.Key2 = T.Key2
WHERE 
CASE WHEN S.Key1 IS NOT NULL 
     /*Matched by Source*/
     THEN CASE WHEN T.Key1 IS NOT NULL  
               /*Matched by Target*/
               THEN CASE WHEN  [T].[Data1]<>S.[Data1] OR 
                               [T].[Data2]<>S.[Data2] OR 
                               [T].[Data3]<>S.[Data3]
                         THEN (1) 
                     END 
                /*Not Matched by Target*/     
                ELSE (4) 
           END 
       /*Not Matched by Source*/     
      ELSE CASE WHEN  [T].[Key1]=@id 
                THEN (3) 
            END 
END IS NOT NULL

That said however the plan up to the filter itself does look quite sub optimal. It is doing a full clustered index scan when perhaps you want a plan with 2 clustered index range seeks. One to retrieve the single row matched by the primary key from the join on source and the other to retrieve the T.Key1 = @id range (though maybe this is to avoid the need to sort into clustered key order later?)

Perhaps you could try this rewrite and see if it works any better or worse

;WITH FilteredTarget AS
(
SELECT T.*
FROM TargetTable  AS T WITH (FORCESEEK)
JOIN @tSource S
    ON (T.Key1 = S.Key1
    AND S.Key2 = T.Key2)
    OR T.Key1 = @id
)
MERGE FilteredTarget AS T
USING @tSource S
ON (T.Key1 = S.Key1
   AND S.Key2 = T.Key2)


-- Only update if the Data columns do not match
WHEN MATCHED AND S.Key1 = T.Key1 AND S.Key2 = T.Key2 AND 
                                         (T.Data1 <> S.Data1 OR
                                          T.Data2 <> S.Data2 OR 
                                          T.Data3 <> S.Data3) THEN
  UPDATE SET T.Data1 = S.Data1,
             T.Data2 = S.Data2,
             T.Data3 = S.Data3

-- Note from original poster: This extra "safety clause" turned out not to
-- affect the behavior or the execution plan, so I removed it and it works
-- just as well without, but if you find yourself in a similar situation
-- you might want to give it a try.
-- WHEN MATCHED AND (S.Key1 <> T.Key1 OR S.Key2 <> T.Key2) AND T.Key1 = @id THEN
--   DELETE

-- Insert when missing in the target
WHEN NOT MATCHED BY TARGET THEN
    INSERT (Key1, Key2, Data1, Data2, Data3)
    VALUES (Key1, Key2, Data1, Data2, Data3)

WHEN NOT MATCHED BY SOURCE AND T.Key1 = @id THEN
    DELETE;
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!