CTE slow performance on Left join

前端 未结 2 1128
旧巷少年郎
旧巷少年郎 2021-01-21 02:12

I need to provide a report that shows all users on a table and their scores. Not all users on said table will have a score, so in my solution I calculate the score first using a

相关标签:
2条回答
  • 2021-01-21 02:57

    UPDATE

    I accepted Alan's answer, i ended up doing the following. Posting examples hoping the formatting helps someone, it slowed me down a bit...or maybe I am just slow heh heh.

    1. Changed my Scalar UDF to InlineTVF

    SCALAR Function 1-

        ALTER FUNCTION [dbo].[fn_WorkDaysAge]
    (
        -- Add the parameters for the function here
        @first_date DATETIME,
        @second_date DATETIME
    )
    RETURNS int
    AS
    BEGIN
        -- Declare the return variable here
        DECLARE @WorkDays int
    
        -- Add the T-SQL statements to compute the return value here
    SELECT @WorkDays = COUNT(*)
    FROM DateDimension
    WHERE Date BETWEEN @first_date AND @second_date
    AND workingday = '1' 
    
        -- Return the result of the function
        RETURN @WorkDays
    
    END
    

    iTVF function 1-

        ALTER FUNCTION [dbo].[fn_iTVF_WorkDaysAge] 
    (   
        -- Add the parameters for the function here
     @FirstDate as Date, 
     @SecondDate as Date
    )
    RETURNS TABLE  AS RETURN 
    
    SELECT WorkDays = COUNT(*)
    FROM DateDimension
    WHERE Date BETWEEN @FirstDate AND @SecondDate
    AND workingday = '1' 
    

    I then updated my next function the same way. I added the CROSS APPLY (something ive personally not used, im still a newbie) as indicated below and replaced the UDFs with the field names in my case statement.

    Old Code

    INNER JOIN tblTauClassList AS T
      ON T.SaRacf = racf
    WHERE
    --FILTERS BY A ROLLING 15 BUSINESS DAYS UNLESS THE DAYS BETWEEN CURRENT DATE AND TAU START DATE ARE <15
    agent_stats.DateTime >=
        CASE WHEN dbo.fn_WorkDaysAge(TauStart, GETDATE()) <15 THEN TauStart ELSE
            dbo.fn_WorkDate15(TauStart) 
        END
    

    New Code

    INNER JOIN tblTauClassList AS T
      ON T.SaRacf = racf
    --iTVFs
    CROSS APPLY dbo.fn_iTVF_WorkDaysAge(TauStart, GETDATE()) as age
    CROSS APPLY dbo.fn_iTVF_WorkDate_15(TauStart) as roll
    WHERE
    --FILTERS BY A ROLLING 15 BUSINESS DAYS UNLESS THE DAYS BETWEEN CURRENT DATE AND TAU START DATE ARE <15
    agent_stats.DateTime >=
        CASE WHEN age.WorkDays <15 THEN TauStart ELSE
            roll.Date 
        END
    

    New code runs in 3-4 seconds. I will go back and index the appropriate tables per your recommendation and probably gain more efficiency there.

    Cannot thank you enough!

    0 讨论(0)
  • 2021-01-21 03:01

    As @Habo mentioned, we need the actual execution plan (e.g. run the query with "include actual execution plan" turned on.) I looked over what you posted and there is nothing there that will explain the problem. The difference with the actual plan vs the estimated plan is that the actual number of rows retrieved are recorded; this is vital for troubleshooting poorly performing queries.

    That said, I do see a HUGE problem with both queries. It's a problem that, once fixed will, improve both queries to less than a second. Your query is leveraging two scalar user Defined Functions (UDFs): dbo.fn_WorkDaysAge & dbo.fn_WorkDate15. Scalar UDFs ruin everything. Not only are they slow, they force a serial execution plan which makes any query they are used in much slower.

    I don't have the code for dbo.fn_WorkDaysAge or dbo.fn_WorkDate15 I have my own "WorkDays" function which is inline (code below). The syntax is a little different but the performance benefits are worth the effort. Here's the syntax difference:

    -- Scalar 
    SELECT d.*, workDays = dbo.countWorkDays_scalar(d.StartDate,d.EndDate)
    FROM   <sometable> AS d;
    
    -- Inline version
    SELECT d.*, f.workDays
    FROM   <sometable> AS d
    CROSS APPLY dbo.countWorkDays(d.StartDate,d.EndDate) AS f;
    

    Here's a performance test I put together to show the difference between an inline version vs the scalar version:

    -- SAMPLE DATA
    IF OBJECT_ID('tempdb..#dates') IS NOT NULL DROP TABLE #dates;
    
    WITH E1(x)  AS (SELECT 1 FROM (VALUES(1),(1),(1),(1),(1),(1),(1),(1),(1),(1)) AS x(x)),
         E3(x)  AS (SELECT 1 FROM E1 a, E1 b, E1 c),
         iTally AS (SELECT N=ROW_NUMBER() OVER (ORDER BY (SELECT 1)) FROM E3 a, E3 b)
    SELECT TOP (100000) 
      StartDate = CAST(DATEADD(DAY,-ABS(CHECKSUM(NEWID())%1000),GETDATE()) AS DATE),
      EndDate   = CAST(DATEADD(DAY,+ABS(CHECKSUM(NEWID())%1000),GETDATE()) AS DATE)
    INTO #dates
    FROM iTally;
    
    -- PERFORMANCE TESTS
    PRINT CHAR(10)+'Scalar Version (always serial):'+CHAR(10)+REPLICATE('-',60);
    GO
    DECLARE @st DATETIME = GETDATE(), @workdays INT;
      SELECT @workdays = dbo.countWorkDays_scalar(d.StartDate,d.EndDate)
      FROM   #dates AS d;
    PRINT DATEDIFF(MS,@st,GETDATE());
    GO 3
    
    PRINT CHAR(10)+'Inline Version:'+CHAR(10)+REPLICATE('-',60);
    GO
    DECLARE @st DATETIME = GETDATE(), @workdays INT;
      SELECT @workdays = f.workDays
      FROM   #dates AS d
      CROSS APPLY dbo.countWorkDays(d.StartDate,d.EndDate) AS f
    PRINT DATEDIFF(MS,@st,GETDATE());
    GO 3
    

    Results:

    Scalar Version (always serial):
    ------------------------------------------------------------
    Beginning execution loop
    380
    363
    350
    Batch execution completed 3 times.
    
    Inline Version:
    ------------------------------------------------------------
    Beginning execution loop
    47
    47
    46
    Batch execution completed 3 times.
    

    As you can see - the inline version about 8 times faster than the scalar version. Replacing those scalar UDFs with an inline version will almost certainly speed this query up regardless of join type.

    Other problems I see include:

    1. I see a lot of Index scans, this is a sign you need more filtering and/or better indexes.

    2. dbo.tblCrosswalkWghtPhnEffTarget does not have any indexes which means it will always get scanned.

    Functions used for performance test:

    -- INLINE VERSION
    ----------------------------------------------------------------------------------------------
    IF OBJECT_ID('dbo.countWorkDays') IS NOT NULL DROP FUNCTION dbo.countWorkDays;
    GO
    CREATE FUNCTION dbo.countWorkDays (@startDate DATETIME, @endDate DATETIME) 
    /*****************************************************************************************
    [Purpose]:
     Calculates the number of business days between two dates (Mon-Fri) and excluded weekends.
     dates.countWorkDays does not take holidays into considerations; for this you would need a 
     seperate "holiday table" to perform an antijoin against.
    
     The idea is based on the solution in this article:
       https://www.sqlservercentral.com/Forums/Topic153606.aspx?PageIndex=16
    
    [Author]:
     Alan Burstein
    
    [Compatibility]:
     SQL Server 2005+
    
    [Syntax]:
    --===== Autonomous
     SELECT f.workDays
     FROM   dates.countWorkDays(@startdate, @enddate) AS f;
    
    --===== Against a table using APPLY
     SELECT t.col1, t.col2, f.workDays
     FROM dbo.someTable t
     CROSS APPLY dates.countWorkDays(t.col1, t.col2) AS f;
    
    [Parameters]:
      @startDate = datetime; first date to compare
      @endDate   = datetime; date to compare @startDate to
    
    [Returns]:
     Inline Table Valued Function returns:
     workDays = int; number of work days between @startdate and @enddate
    
    [Dependencies]:
     N/A
    
    [Developer Notes]:
     1. NULL when either input parameter is NULL, 
    
     2. This function is what is referred to as an "inline" scalar UDF." Technically it's an
        inline table valued function (iTVF) but performs the same task as a scalar valued user
        defined function (UDF); the difference is that it requires the APPLY table operator
        to accept column values as a parameter. For more about "inline" scalar UDFs see this
        article by SQL MVP Jeff Moden: http://www.sqlservercentral.com/articles/T-SQL/91724/
        and for more about how to use APPLY see the this article by SQL MVP Paul White:
        http://www.sqlservercentral.com/articles/APPLY/69953/.
    
        Note the above syntax example and usage examples below to better understand how to
        use the function. Although the function is slightly more complicated to use than a
        scalar UDF it will yield notably better performance for many reasons. For example,
        unlike a scalar UDFs or multi-line table valued functions, the inline scalar UDF does
        not restrict the query optimizer's ability generate a parallel query execution plan.
    
     3. dates.countWorkDays requires that @enddate be equal to or later than @startDate. Otherwise
        a NULL is returned.
    
     4. dates.countWorkDays is NOT deterministic. For more deterministic functions see:
        https://msdn.microsoft.com/en-us/library/ms178091.aspx
    
    [Examples]:
     --===== 1. Basic Use
     SELECT f.workDays 
     FROM   dates.countWorkDays('20180608', '20180611') AS f;
    
    ---------------------------------------------------------------------------------------
    [Revision History]: 
     Rev 00 - 20180625 - Initial Creation - Alan Burstein
    *****************************************************************************************/
    RETURNS TABLE WITH SCHEMABINDING AS RETURN
    SELECT workDays =
        -- If @startDate or @endDate are NULL then rerturn a NULL
      CASE WHEN SIGN(DATEDIFF(dd, @startDate, @endDate)) > -1 THEN
                    (DATEDIFF(dd, @startDate, @endDate) + 1) --total days including weekends
                   -(DATEDIFF(wk, @startDate, @endDate) * 2) --Subtact 2 days for each full weekend
        -- Subtract 1 when startDate is Sunday and Substract 1 when endDate is Sunday: 
        -(CASE WHEN DATENAME(dw, @startDate) = 'Sunday'   THEN 1 ELSE 0 END)
        -(CASE WHEN DATENAME(dw, @endDate)   = 'Saturday' THEN 1 ELSE 0 END)
      END;
    GO    
    
    -- SCALAR VERSION
    ----------------------------------------------------------------------------------------------
    IF OBJECT_ID('dbo.countWorkDays_scalar') IS NOT NULL DROP FUNCTION dbo.countWorkDays_scalar;
    GO
    CREATE FUNCTION dbo.countWorkDays_scalar (@startDate DATETIME, @endDate DATETIME) 
    RETURNS INT WITH SCHEMABINDING AS
    BEGIN
      RETURN
      (
        SELECT workDays =
            -- If @startDate or @endDate are NULL then rerturn a NULL
          CASE WHEN SIGN(DATEDIFF(dd, @startDate, @endDate)) > -1 THEN
                        (DATEDIFF(dd, @startDate, @endDate) + 1) --total days including weekends
                       -(DATEDIFF(wk, @startDate, @endDate) * 2) --Subtact 2 days for each full weekend
            -- Subtract 1 when startDate is Sunday and Substract 1 when endDate is Sunday: 
            -(CASE WHEN DATENAME(dw, @startDate) = 'Sunday'   THEN 1 ELSE 0 END)
            -(CASE WHEN DATENAME(dw, @endDate)   = 'Saturday' THEN 1 ELSE 0 END)
          END
      );
    END
    GO
    

    UPDATE BASED ON OP'S QUESTION IN THE COMMENTS:

    First for the inline table valued function version of each function. Note that I'm using my own tables and don't have time to make the names match your environment but I did my best to include comments in the code. Also note that if, in your function, workingday = '1' is simply pulling weekdays then you'll find my function above to be a much faster alternative to your dbo.fn_WorkDaysAge function. If workingday = '1' also filters out holidays then it won't work.

    CREATE FUNCTION dbo.fn_WorkDaysAge_itvf
    (
     @first_date  DATETIME,
     @second_date DATETIME
    )
    RETURNS TABLE AS RETURN
    SELECT  WorkDays = COUNT(*)
    FROM    dbo.dimdate -- DateDimension
    WHERE   DateValue   -- [date]
    BETWEEN @first_date AND @second_date
    AND     IsWeekend = 0 --workingday = '1'
    GO
    
    CREATE FUNCTION dbo.fn_WorkDate15_itvf
    (
     @TauStartDate DATETIME
    )
    RETURNS TABLE AS RETURN
    WITH DATES AS 
    (
      SELECT 
      ROW_NUMBER() OVER(Order By DateValue Desc) as RowNum, DateValue
      FROM dbo.dimdate -- DateDimension
      WHERE DateValue BETWEEN @TauStartDate AND --GETDATE() testing below 
       CASE WHEN GETDATE() < @TauStartDate + 200 THEN GETDATE() ELSE @TauStartDate + 200 END
      AND IsWeekend = 0 --workingday = '1'
    )
    --Get the 15th businessday from the current date
    SELECT DateValue
    FROM  DATES
    WHERE RowNum = 16;
    GO
    

    Now, to replace your scalar UDFs with the inline table valued functions, you would do this (note my comments):

    WITH agent_split_stats AS ( 
    Select
        racf,
        agent_stats.SkillGroupSkillTargetID,
        aht_target.EnterpriseName,
        aht_target.target,
        Sum(agent_stats.CallsHandled) as n_calls_handled,
        CASE WHEN (Sum(agent_stats.TalkInTime) + Sum(agent_stats.IncomingCallsOnHoldTime) + Sum(agent_stats.WorkReadyTime)) = 0 THEN 1 ELSE
            (Sum(agent_stats.TalkInTime) + Sum(agent_stats.IncomingCallsOnHoldTime) + Sum(agent_stats.WorkReadyTime)) END
        AS total_handle_time
    from tblAceyusAgntSklGrp as agent_stats
    INNER JOIN tblCrosswalkWghtPhnEffTarget as aht_target
      ON aht_target.SgId = agent_stats.SkillGroupSkillTargetID
      AND agent_stats.DateTime BETWEEN aht_target.StartDt and aht_target.EndDt
    INNER JOIN tblAgentMetricCrosswalk as xwalk
      ON xwalk.SkillTargetID = agent_stats.SkillTargetID
    INNER JOIN tblTauClassList AS T
      ON T.SaRacf = racf
    -- INLINE FUNCTIONS HERE:
    CROSS APPLY dbo.fn_WorkDaysAge_itvf(TauStart, GETDATE()) AS wd
    CROSS APPLY dbo.fn_WorkDate15_itvf(TauStart)             AS w15
    -- NEW WHERE CLAUSE:
    WHERE       agent_stats.DateTime >= 
                  CASE WHEN wd.workdays < 15 THEN TauStart ELSE w15.workdays END
    And Graduated = 'No'
    AND CallsHandled <> 0
    AND Target is not null
    Group By
    racf, agent_stats.SkillGroupSkillTargetID, aht_target.EnterpriseName, aht_target.target
    ),
    agent_split_stats_with_weight AS (
    SELECT 
        agent_split_stats.*,
        agent_split_stats.n_calls_handled/SUM(agent_split_stats.n_calls_handled) OVER(PARTITION BY agent_split_stats.racf) AS [weight]
    FROM agent_split_stats
    ),
    agent_split_effectiveness AS 
    (
      SELECT 
          agent_split_stats_with_weight.*,
          (((agent_split_stats_with_weight.target * agent_split_stats_with_weight.n_calls_handled) / 
             agent_split_stats_with_weight.total_handle_time)*100)*
             agent_split_stats_with_weight.weight AS effectiveness_sum
      FROM agent_split_stats_with_weight
    ),
    agent_effectiveness AS
    (
      SELECT 
          racf AS SaRacf,
          ROUND(SUM(effectiveness_sum),2) AS WpeScore
      FROM agent_split_effectiveness
      GROUP BY racf
    ),
    tau AS
    (
      SELECT L.SaRacf, TauStart, Goal as WpeGoal 
      ,CASE WHEN agent_effectiveness.WpeScore IS NULL THEN 1 ELSE WpeScore END as WpeScore
      FROM tblTauClassList AS L
      LEFT JOIN agent_effectiveness
        ON agent_effectiveness.SaRacf = L.SaRacf
      LEFT JOIN tblCrosswalkTauGoal AS G
        ON  G.Year   = TauYear
        AND G.Bucket = 'Wpe'
      WHERE TermDate IS NULL
      AND   Graduated = 'No'
    )
    SELECT tau.*,
    -- NEW CASE STATEMENT HERE: 
    CASE WHEN wd.workdays > 14 AND WpeScore >= WpeGoal THEN 'Pass' ELSE 'Fail' END 
    from tau
    -- INLINE FUNCTIONS HERE:
    CROSS APPLY dbo.fn_WorkDaysAge_itvf(TauStart, GETDATE()) AS wd
    CROSS APPLY dbo.fn_WorkDate15_itvf(TauStart)             AS w15;
    

    Note that I can't test this right now but it should be correct (or close)

    0 讨论(0)
提交回复
热议问题