Grouping and counting rows by value until it changes

前端 未结 2 1580
心在旅途
心在旅途 2020-12-15 00:47

I have a table where messages are stored as they happen. Usually there is a message \'A\' and sometimes the A\'s are separated by a single message \'B\'. Now I want to group

相关标签:
2条回答
  • 2020-12-15 01:13

    Here is a little bit smaller solution:

    DECLARE @t TABLE ( d DATE, m CHAR(1) )
    
    INSERT  INTO @t
    VALUES  ( '20150301', 'A' ),
            ( '20150302', 'A' ),
            ( '20150303', 'B' ),
            ( '20150304', 'A' ),
            ( '20150305', 'A' ),
            ( '20150306', 'A' ),
            ( '20150307', 'B' );
    
    WITH 
    c1 AS(SELECT d, m, IIF(LAG(m, 1, m) OVER(ORDER BY d) = m, 0, 1) AS n FROM @t),
    c2 AS(SELECT m, SUM(n) OVER(ORDER BY d) AS n FROM c1) 
        SELECT m, COUNT(*) AS c
        FROM c2
        GROUP BY m, n
    

    Output:

    m   c
    A   2
    B   1
    A   3
    B   1
    

    The idea is to get value 1 at rows where message is changed:

    2015-03-01  A   0
    2015-03-02  A   0
    2015-03-03  B   1
    2015-03-04  A   1
    2015-03-05  A   0
    2015-03-06  A   0
    2015-03-07  B   1
    

    The second step is just sum of current row value + all preceding values:

    2015-03-01  A   0
    2015-03-02  A   0
    2015-03-03  B   1
    2015-03-04  A   2
    2015-03-05  A   2
    2015-03-06  A   2
    2015-03-07  B   3
    

    This way you get grouping sets by message column and calculated column.

    0 讨论(0)
  • 2020-12-15 01:29

    That was interesting :)

    ;WITH cte as (
    SELECT Messages.Message, Timestamp, 
    ROW_NUMBER() OVER(PARTITION BY Message ORDER BY Timestamp) AS gn,
    ROW_NUMBER() OVER (ORDER BY Timestamp) AS rn
    FROM Messages
    ), cte2 AS (
    SELECT Message, Timestamp, gn, rn, gn - rn  as gb
    FROM cte 
    ), cte3 AS (
    SELECT Message, MIN(Timestamp) As Ts, COUNT(1) as Cnt
    FROM cte2
    GROUP BY Message, gb)
    SELECT Message, Cnt FROM cte3
    ORDER BY Ts
    

    Here is the result set:

      Message   Cnt
        A   2
        B   1
        A   3
        B   1
    

    The query may be shorter but I post it that way so you can see what's happening. The result is exactly as requested. This is the most important part gn - rn the idea is to number the rows in each partition and at the same time number the rows in the whole set then if you subtract the one from the other you'll get the 'rank' of each group.

    ;WITH cte as (
    SELECT Messages.Message, Timestamp, 
    ROW_NUMBER() OVER(PARTITION BY Message ORDER BY Timestamp) AS gn,
    ROW_NUMBER() OVER (ORDER BY Timestamp) AS rn
    FROM Messages
    ), cte2 AS (
    SELECT Message, Timestamp, gn, rn, gn - rn  as gb
    FROM cte 
    )
    SELECT * FROM cte2
    
    Message Timestamp           gn  rn  gb
    A   2015-03-29 00:00:00.000 1   1   0
    A   2015-03-29 00:01:00.000 2   2   0
    B   2015-03-29 00:02:00.000 1   3   -2
    A   2015-03-29 00:03:00.000 3   4   -1
    A   2015-03-29 00:04:00.000 4   5   -1
    A   2015-03-29 00:05:00.000 5   6   -1
    B   2015-03-29 00:06:00.000 2   7   -5
    
    0 讨论(0)
提交回复
热议问题