TSQL Finding Order that occurred in 3 consecutive months

前端未结

关注

 4  748

悲哀的现实 2021-01-05 00:56

Please help me to generate the following query. Say I have customer table and order table.

Customer Table

CustID CustName

1      AA     
2      BB


      
      
        
          4条回答        

        
                    
            
            
                         
                
              
              
                
                   温柔的废话
                                             
                
                
                (楼主)
            
              
              
                2021-01-05 01:20
              

            
            
                        
Here is my version. I really was presenting this as a mere curiosity, to show another way of thinking about the problem. It turned out to be more useful than that because it performed better than even Martin Smith's cool "grouped islands" solution. Though, once he got rid of some overly expensive aggregate windowing functions and did real aggregates instead, his query started kicking butt.

Solution 1: Runs of 3 months or more, done by checking 1 month ahead and behind and using a semi-join against that.

WITH Months AS (
   SELECT DISTINCT
      O.CustID,
      Grp = DateDiff(Month, '20000101', O.OrderDate)
   FROM
      CustOrder O
), Anchors AS (
   SELECT
      M.CustID,
      Ind = M.Grp + X.Offset
   FROM
      Months M
      CROSS JOIN (
         SELECT -1 UNION ALL SELECT 0 UNION ALL SELECT 1
      ) X (Offset)
   GROUP BY
      M.CustID,
      M.Grp + X.Offset
   HAVING
      Count(*) = 3
)
SELECT
   C.CustName,
   [Year] = Year(OrderDate),
   O.OrderDate
FROM
   Cust C
   INNER JOIN CustOrder O ON C.CustID = O.CustID
WHERE
   EXISTS (
      SELECT 1
      FROM
         Anchors A
      WHERE
         O.CustID = A.CustID
         AND O.OrderDate >= DateAdd(Month, A.Ind, '19991201')
         AND O.OrderDate < DateAdd(Month, A.Ind, '20000301')
   )
ORDER BY
   C.CustName,
   OrderDate;


Solution 2: Exact 3-month patterns. If it is a 4-month or greater run, the values are excluded. This is done by checking 2 months ahead and two months behind (essentially looking for the pattern N, Y, Y, Y, N).

WITH Months AS (
   SELECT DISTINCT
      O.CustID,
      Grp = DateDiff(Month, '20000101', O.OrderDate)
   FROM
      CustOrder O
), Anchors AS (
   SELECT
      M.CustID,
      Ind = M.Grp + X.Offset
   FROM
      Months M
      CROSS JOIN (
         SELECT -2 UNION ALL SELECT -1 UNION ALL SELECT 0 UNION ALL SELECT 1 UNION ALL SELECT 2
      ) X (Offset)
   GROUP BY
      M.CustID,
      M.Grp + X.Offset
   HAVING
      Count(*) = 3
      AND Min(X.Offset) = -1
      AND Max(X.Offset) = 1
)
SELECT
   C.CustName,
   [Year] = Year(OrderDate),
   O.OrderDate
FROM
   Cust C
   INNER JOIN CustOrder O ON C.CustID = O.CustID
   INNER JOIN Anchors A
      ON O.CustID = A.CustID
      AND O.OrderDate >= DateAdd(Month, A.Ind, '19991201')
      AND O.OrderDate < DateAdd(Month, A.Ind, '20000301')
ORDER BY
   C.CustName,
   OrderDate;


Here's my table-loading script if anyone else wants to play:

IF Object_ID('CustOrder', 'U') IS NOT NULL DROP TABLE CustOrder
IF Object_ID('Cust', 'U') IS NOT NULL DROP TABLE Cust
GO
SET NOCOUNT ON
CREATE TABLE Cust (
  CustID int identity(1,1) NOT NULL PRIMARY KEY CLUSTERED,
  CustName varchar(100) UNIQUE
)

CREATE TABLE CustOrder (
   OrderID int identity(100, 1) NOT NULL PRIMARY KEY CLUSTERED,
   CustID int NOT NULL FOREIGN KEY REFERENCES Cust (CustID),
   OrderDate smalldatetime NOT NULL
)

DECLARE @i int
SET @i = 1000
WHILE @i > 0 BEGIN
   WITH N AS (
      SELECT
         Nm =
            Char(Abs(Checksum(NewID())) % 26 + 65)
            + Char(Abs(Checksum(NewID())) % 26 + 97)
            + Char(Abs(Checksum(NewID())) % 26 + 97)
            + Char(Abs(Checksum(NewID())) % 26 + 97)
            + Char(Abs(Checksum(NewID())) % 26 + 97)
            + Char(Abs(Checksum(NewID())) % 26 + 97)
   )
   INSERT Cust
   SELECT N.Nm
   FROM N
   WHERE NOT EXISTS (
      SELECT 1
      FROM Cust C
      WHERE
         N.Nm = C.CustName
   )

   SET @i = @i - @@RowCount
END
WHILE @i < 50000 BEGIN
   INSERT CustOrder
   SELECT TOP (50000 - @i)
      Abs(Checksum(NewID())) % 1000 + 1,
      DateAdd(Day, Abs(Checksum(NewID())) % 10000, '19900101')
   FROM master.dbo.spt_values
   SET @i = @i + @@RowCount
END


Performance

Here are some performance testing results for the 3-month-or-more queries:

Query     CPU   Reads Duration
Martin 1  2297 299412   2348 
Martin 2   625    285    809
Denis     3641    401   3855
Erik      1855  94727   2077


This is only one run of each, but the numbers are fairly representative. It turns out that your query wasn't so badly-performing, Denis, after all. Martin's query beats the others hands down, but at first was using some overly-expensive windowing functions strategies that he fixed.

Of course, as I noted, Denis's query isn't pulling the right rows when a customer has two orders on the same day, so his query is out of contention unless he fixed is.

Also, different indexes could possibly shake things up. I don't know.
    
             
                                                        
            
            
              
                
                0
              
                   
                
               讨论(0)
              
                                                  
              
              
                          
             
       
          
              
                                       
     查看其它4个回答


            
                         
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
                              			
        
        
        
          
            
            
              
              
            
    


                                 
              
            
                          
    

        
         
                验证码
                
                  
                
                
                   看不清?
                
              
                                  
                    
   
                 
             
              提交回复