Sql Query Pervious Row Optimisation

后端 未结 2 1151
粉色の甜心
粉色の甜心 2020-12-06 23:55

Here is my table structure

MyTable
-----------

ObjectID int (Identity),           -- Primary Key
FileName varchar(10),
CreatedDate datetime
...........
....         


        
相关标签:
2条回答
  • 2020-12-07 00:06
    SELECT t1.FileName, t1.CreatedDate, t2.CreatedDate as PrevCreatedDate
    FROM 
       (SELECT FileName, CreateDate,
              ROW_NUMBER() OVER(PARTITION BY FileName ORDER BY CreatedDate) AS OrderNo
       FROM MyTable) t1
    LEFT JOIN
       (SELECT FileName, CreateDate,
         ROW_NUMBER() OVER(PARTITION BY FileName ORDER BY CreatedDate) AS OrderNo
         FROM MyTable) t2
    ON (t1.FileName = t2.FileName AND t1.OrderNo = t2.OrderNo - 1)
    

    Or may be better use 'WITH', because queries is identical:

    WITH t(ObjectID, FileName, CreatedDate, OrderNo) AS
       (SELECT ObjectID, FileName, CreatedDate,
              ROW_NUMBER() OVER(PARTITION BY FileName ORDER BY CreatedDate) AS OrderNo
       FROM MyTable) 
    SELECT t1.ObjectID, t1.FileName, t1.CreatedDate, t2.CreatedDate AS PrevCreatedDate,
            DATEDIFF("SS", '1900-01-01 00:00:00', 
               COALESCE((t1.CreatedDate - t2.CreatedDate),0)) AS secondsTaken
    FROM t t1 LEFT JOIN t t2 
    ON (t1.FileName = t2.FileName AND t1.OrderNo = t2.OrderNo + 1)
    
    0 讨论(0)
  • 2020-12-07 00:16

    I think Michael's answer should indeed prove more efficient. When evaluating efficiency though I just want to draw attention to an issue with the query costs (relative to the batch) shown in Management Studio.

    I set up a test table with 23,174 rows and ran the query in the question and Michael's. Looking at the "query cost (relative to the batch)" in the actual execution plan the original query was 1% and Michael's 99% cost and so appears to be massively inefficient.

    However the actual stats tell a completely different story

    Cross Apply Approach

    SQL Server Execution Times: CPU time = 0 ms, elapsed time = 0 ms.
    Table 'MyTable'. Scan count 23175, logical reads 49335, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    ROW_NUMBER approach

    SQL Server Execution Times: CPU time = 391 ms, elapsed time = 417 ms.
    Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
    Table 'MyTable'. Scan count 2, logical reads 148, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    In the ROW_NUMBER plan the Merge Join on rownumber=rownumber+1 has 23,174 rows going in to both sides. This value is unique and actual rows out is 23,174 as well. However SQL Server estimates that the rows produced from that join will be 34,812,000 and thus its estimated cost for the insert later in the plan is wildly inaccurate.

    Test Script

    BEGIN TRAN
    
    CREATE TABLE MyTable
      (
         [ObjectID]    [INT] IDENTITY(1, 1) NOT NULL PRIMARY KEY CLUSTERED,
         [FileName]    [VARCHAR](50) NULL,
         [CreatedDate] [DATETIME] NULL
      )
    
    GO
    
    INSERT INTO MyTable
    SELECT ISNULL(type, NEWID()),
           DATEADD(DAY, CAST(RAND(CAST(NEWID() AS VARBINARY)) * 10000 AS INT), GETDATE())
    FROM   master.dbo.spt_values,
           (SELECT TOP 10 1 AS X FROM  master.dbo.spt_values) V
    
    
    DELETE FROM MyTable
    WHERE  EXISTS(SELECT *
                  FROM   MyTable m2
                  WHERE  MyTable.CreatedDate = m2.CreatedDate
                         AND MyTable.FileName = m2.FileName
                         AND MyTable.ObjectID < m2.ObjectID)
    
    CREATE UNIQUE NONCLUSTERED INDEX [IX_MyTable]
      ON MyTable ([FileName] ASC, [CreatedDate] ASC)
    
    SET STATISTICS IO ON
    SET STATISTICS TIME ON
    
    SELECT A.ObjectID,
           A.FileName,
           A.CreatedDate                                                                                AS CreatedDate,
           B.PrevRowCreatedDate,
           DATEDIFF("SS", '1900-01-01 00:00:00', COALESCE(( A.CreatedDate - B.PrevRowCreatedDate ), 0)) AS secondsTaken
    INTO   #A
    FROM   MyTable AS A
           CROSS APPLY ((SELECT PrevRowCreatedDate = MAX(CreatedDate)
                         FROM   MyTable AS BB
                         WHERE  BB.FileName = A.FileName
                                AND BB.CreatedDate < A.CreatedDate)) AS B;
    
    WITH t(ObjectID, FileName, CreatedDate, OrderNo)
         AS (SELECT ObjectID,
                    FileName,
                    CreatedDate,
                    RANK() OVER(PARTITION BY FileName ORDER BY CreatedDate) AS OrderNo
             FROM   MyTable)
    SELECT t1.ObjectID,
           t1.FileName,
           t1.CreatedDate,
           t2.CreatedDate                                                                          AS PrevCreatedDate,
           DATEDIFF("SS", '1900-01-01 00:00:00', COALESCE(( t1.CreatedDate - t2.CreatedDate ), 0)) AS secondsTaken
    INTO   #B
    FROM   t t1
           LEFT JOIN t t2
             ON ( t1.FileName = t2.FileName
                  AND t1.OrderNo = t2.OrderNo + 1 )
    
    /*Test the 2 queries give the same result*/
    SELECT *
    FROM   #A
    EXCEPT
    SELECT *
    FROM   #B
    
    SELECT *
    FROM   #B
    EXCEPT
    SELECT *
    FROM   #A
    
    ROLLBACK 
    
    0 讨论(0)
提交回复
热议问题