Tag consecutive non zero rows into distinct partitions?

走远了吗. 提交于 2020-01-25 05:38:25

问题


Suppose we have this simple schema and data:

DROP TABLE #builds
CREATE TABLE #builds (
    Id INT IDENTITY(1,1) NOT NULL,
    StartTime INT,
    IsPassed BIT
)
INSERT INTO #builds (StartTime, IsPassed) VALUES
(1, 1),
(7, 1),
(10, 0),
(15, 1),
(21, 1),
(26, 0),
(34, 0),
(44, 0),
(51, 1),
(60, 1)

SELECT StartTime, IsPassed, NextStartTime,
    CASE IsPassed WHEN 1 THEN 0 ELSE NextStartTime - StartTime END Duration
FROM (
    SELECT  
        LEAD(StartTime) OVER (ORDER BY StartTime) NextStartTime,
        StartTime, IsPassed
    FROM #builds
) x
ORDER BY StartTime

It produces the following result set:

StartTime   IsPassed    NextStartTime   Duration
1           1           7               0
7           1           10              0
10          0           15              5
15          1           21              0
21          1           26              0
26          0           34              8
34          0           44              10
44          0           51              7
51          1           60              0
60          1           NULL            0

I need to summarize the non zero consecutive Duration values and to show them at the StartTime of the first row in the batch. I.e. I need to get to this:

StartTime   Duration
10          5
26          25

I just can't figure out how to do it.

PS: The real table contains many more rows, of course.


回答1:


This is a gaps and islands problem, requiring partitioning each section where IsPassed is constant into a different group. That can be done by computing the difference between ROW_NUMBER() over the entire table against partitioned by IsPassed. You can then SUM the Duration Values for each group where IsPassed = False and take the MIN(StartTime) to give the StartTime of the first row of the group:

WITH CTE AS (
  SELECT StartTime, IsPassed,
         LEAD(StartTime) OVER (ORDER BY StartTime) AS NextStartTime
  FROM #builds
),
CTE2 AS (
  SELECT StartTime, IsPassed, NextStartTime,
         CASE IsPassed WHEN 1 THEN 0 ELSE NextStartTime - StartTime END Duration,
         ROW_NUMBER() OVER (ORDER BY StartTime) -
         ROW_NUMBER() OVER (PARTITION BY IsPassed ORDER BY StartTime) AS grp
  FROM CTE
)
SELECT MIN(StartTime) AS StartTime, SUM(Duration) AS Duration
FROM CTE2
WHERE IsPassed = 0
GROUP BY grp
ORDER BY MIN(StartTime)

Output:

StartTime   Duration
10          5
26          25

Demo on dbfiddle




回答2:


Your approach is unnecessarily complicated. You simply need to assign the 0s to groups that include exactly the following 1.

You can do this by counting the number of "1"s on or after each row. Of course, this also assigns a grouping to the rows with no "0"s. These can be filtered out by ensuring that there is at least on 0 in each group:

select min(StartTime), max(startTime) - min(startTime)
from (select b.*,
             sum(case when IsPassed = 1 then 1 else 0 end) over (order by StartTime desc) as grp
      from builds b
     ) b
group by grp
having min(convert(int, IsPassed)) = 0
order by min(StartTime);

Here is a db<>fiddle.

Or an alternative method uses no aggregation at all. It simply gets the next "1" starttime for each row and then filters down to the first "0" row:

select StartTime, next_1_starttime - StartTime
from (select b.*,
             lag(IsPassed) over (order by StartTime) as prev_IsPassed,
             min(case when IsPassed = 1 then StartTime end) over (order by StartTime desc) as next_1_starttime
      from builds b
     ) b
where IsPassed = 0 and (prev_IsPassed = 1 or prev_IsPassed is null)
order by StartTime;

This probably has the best performance of the alternatives.



来源:https://stackoverflow.com/questions/59889916/tag-consecutive-non-zero-rows-into-distinct-partitions

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!