How do I exclude rows when an incremental value starts over?

主宰稳场 提交于 2019-12-07 12:08:47

问题


I am a newbie poster but have spent a lot of time researching answers here. I can't quite figure out how to create a SQL result set using SQL Server 2008 R2 that should probably be using lead/lag from more modern versions. I am trying to aggregate data based on sequencing of one column, but there can be varying numbers of instances in each sequence. The only way I know a sequence has ended is when the next row has a lower sequence number. So it may go 1-2, 1-2-3-4, 1-2-3, and I have to figure out how to make 3 aggregates out of that.

Source data is joined tables that look like this (please help me format):

recordID instanceDate moduleID iResult interactionNum
1356    10/6/15 16:14   1        68          1
1357    10/7/15 16:22   1        100         2
1434    10/9/15 16:58   1        52          1
1435    10/11/15 17:00  1        60          2
1436    10/15/15 16:57  1        100         3
1437    10/15/15 16:59  1        100         4

I need to find a way to separate the first 2 rows from the last 4 rows in this example, based on values in the last column.

What I would love to ultimately get is a result set that looks like this, which averages the iResult column based on the grouping and takes the first instanceDate from the grouping:

instanceDate    moduleID    iResult
10/6/15           1          84
10/9/15           1          78

I can aggregate to get this result using MIN and AVG if I can just find a way to separate the groups. The data is ordered by instanceDate (please ignore the date formatting here) then interactionNum and the group separation should happen when the query finds a row where the interactionNum is <= than the previous row (will usually start over with '1' but not always, so prefer just to separate on a lower or equal integer value).

Here is the query I have so far (includes the joins that give the above data set):

SELECT 
    X.* 
FROM
   (SELECT TOP 100 PERCENT   
        instanceDate, b.ModuleID, iResult, b.interactionNum 
    FROM 
        (firstTable a  
    INNER JOIN 
        secondTable b ON b.someID = a.someID)       
    WHERE 
        a.someID = 2        
        AND b.otherID LIKE 'xyz'    
        AND a.ModuleID = 1
    ORDER BY 
        instanceDate) AS  X

OUTER APPLY

(SELECT TOP 1 
     *
 FROM
     (SELECT    
          instanceDate, d.ModuleID, iResult, d.interactionNum   
      FROM 
          (firstTable c  
      INNER JOIN 
          secondTable d ON d.someID = c.someID) 
      WHERE 
          c.someID = 2      
          AND d.otherID LIKE 'xyz'  
          AND c.ModuleID = 1    
          AND d.interactionNum = X.interactionNum
          AND c.instanceDate < X.instanceDate)  X2
      ORDER BY 
          instanceDate DESC) Y
WHERE 
    NOT EXISTS (SELECT Y.interactionNum INTERSECT SELECT X.interactionNum)

But this is returning an interim result set like this:

instanceDate    ModuleID    iResult interactionNum
10/6/15 16:10   1            68         1
10/6/15 16:14   1            100        2
10/15/15 16:57  1            100        3
10/15/15 16:59  1            100        4

and the problem is that interactionNum 3, 4 do not belong in this result set. They would go in the next result set when I loop over this query. How do I keep them out of the result set in this iteration? I need the result set from this query to just include the first two rows, 'seeing' that row 3 of the source data has a lower value for interactionNum than row 2 has.


回答1:


Not sure what ModuleID was supposed to be used, but I guess you're looking for something like this:

select min (instanceDate), [moduleID], avg([iResult])
from (
  select *,row_number() over (partition by [moduleID] order by instanceDate) as RN
  from Table1
) X
group by [moduleID], RN - [interactionNum]

The idea here is to create a running number with row_number for each moduleid, and then use the difference between that and InteractionNum as grouping criteria.

Example in SQL Fiddle




回答2:


Here is my solution, although it should be said, I think @JamesZ answer is cleaner.

I created a new field called newinstance which is 1 wherever your instanceNumber is 1. I then created a rolling sum(newinstance) called rollinginstance to group on.

Change the last select to SELECT * FROM cte2 to show all the fields I added.

IF OBJECT_ID('tempdb..#tmpData') IS NOT NULL
    DROP TABLE #tmpData

CREATE TABLE #tmpData (recordID INT, instanceDate DATETIME, moduleID INT, iResult INT, interactionNum INT)

INSERT INTO #tmpData
SELECT 1356,'10/6/15 16:14',1,68,1 UNION
SELECT 1357,'10/7/15 16:22',1,100,2 UNION
SELECT 1434,'10/9/15 16:58',1,52,1 UNION
SELECT 1435,'10/11/15 17:00',1,60,2 UNION
SELECT 1436,'10/15/15 16:57',1,100,3 UNION
SELECT 1437,'10/15/15 16:59',1,100,4

;WITH cte1 AS
(
    SELECT *,
           CASE WHEN interactionNum=1 THEN 1 ELSE 0 END AS newinstance,
           ROW_NUMBER() OVER(ORDER BY recordID) as rowid
    FROM #tmpData
), cte2 AS
    (
        SELECT *,
               (select SUM(newinstance) from cte1 b where b.rowid<=a.rowid) as rollinginstance
        FROM cte1 a
    )

SELECT MIN(instanceDate) AS instanceDate, moduleID, AVG(iResult) AS iResult
FROM cte2
GROUP BY moduleID, rollinginstance


来源:https://stackoverflow.com/questions/33811993/how-do-i-exclude-rows-when-an-incremental-value-starts-over

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!