Partial sum between different records using SQL 2008

问题

I'm trying to solve this issue in SQL 2008. I've a table like this:

DECLARE @table TABLE (
    TimeStamp        DATETIME,
    val              INT,
    typerow          VARCHAR(3)
);

INSERT INTO @table(TimeStamp, val, typerow)
VALUES
   ('2018-06-03 13:30:00.000', 6, 'out'),
   ('2018-06-03 14:10:00.000', 8, 'out'),
   ('2018-06-03 14:30:00.000', 3, 'in'),
   ('2018-06-03 15:00:00.000', 9, 'out'),
   ('2018-06-03 15:30:00.000', 4, 'out'),
   ('2018-06-03 16:00:00.000', 2, 'out'),
   ('2018-06-03 17:05:00.000', 8, 'in'),
   ('2018-06-03 17:30:00.000', 0, 'out'),
   ('2018-06-03 18:15:00.000', 7, 'out'),
   ('2018-06-03 18:30:00.000', 1, 'in'),
   ('2018-06-03 19:00:00.000', 5, 'out')

This table contains distinct TimeStamp with relative values val and a binary column ('in'/'out') typerow.

Considering @table sorted by TimeStamp ascending, I need to figure a way to get a table in which every row with typerow = 'in' contains in val column its current value plus the sum of all previous integer in val field where typerow = 'out', until the previous typerow = 'in' record. Naturally for the first record with typerow = 'in', the sum will be extended until the first record of @table

2018-06-03 13:30:00.000    6      out
2018-06-03 14:10:00.000    8      out
2018-06-03 14:30:00.000    17     in  -- 6 + 8 + 3
2018-06-03 15:00:00.000    9      out
2018-06-03 15:30:00.000    4      out
2018-06-03 16:00:00.000    2      out
2018-06-03 17:05:00.000    23     in  -- 9 + 4 + 2 + 8
2018-06-03 17:30:00.000    0      out
2018-06-03 18:15:00.000    7      out
2018-06-03 18:30:00.000    8      in  -- 0 + 7 + 1
2018-06-03 19:00:00.000    5      out

Considering @table will have hundreds of records made in this way, my first idea is to create a new id column and associate same id to all records involved in the same summation (maybe it's possible to do that by recursive CTE?) to get this result:

2018-06-03 13:30:00.000    6      out    1
2018-06-03 14:10:00.000    8      out    1
2018-06-03 14:30:00.000    17     in     1
2018-06-03 15:00:00.000    9      out    2
2018-06-03 15:30:00.000    4      out    2
2018-06-03 16:00:00.000    2      out    2
2018-06-03 17:05:00.000    23     in     2
2018-06-03 17:30:00.000    0      out    3
2018-06-03 18:15:00.000    7      out    3
2018-06-03 18:30:00.000    8      in     3
2018-06-03 19:00:00.000    5      out    don't care for this element

and have a new column like

SELECT SUM(vals) OVER (PARTITION BY id ORDER BY id) AS partial_sum

updating val column with partial_sum where typerow = 'in'. I don't know how create new id column correctly and if this is a good solution, considering also my SQL Server version.

Thanks in advance for your support, any suggestion is appreciated.

回答1:

This is a gaps-and-islands problem, where each island ends with an "in" record, and you want to sum the values in each island.

Here is one approach that uses the count of following "in"s to define the group, and then a window sum over each group.

select timestamp,
    case when val = 'out' 
        then val
        else sum(val) over(partition by grp order by timestamp)
    end as val,
    typerow
from (
    select t.*,
        sum(case when typerow = 'in' then 1 else 0 end) over(order by timestamp desc) grp
    from @table t
) t
order by timestamp

Demo on DB Fiddle:

timestamp               | val | typerow
:---------------------- | --: | :------
2018-06-03 13:30:00.000 |   6 | out    
2018-06-03 14:10:00.000 |   8 | out    
2018-06-03 14:30:00.000 |  17 | in     
2018-06-03 15:00:00.000 |   9 | out    
2018-06-03 15:30:00.000 |   4 | out    
2018-06-03 16:00:00.000 |   2 | out    
2018-06-03 17:05:00.000 |  23 | in     
2018-06-03 17:30:00.000 |   0 | out    
2018-06-03 18:15:00.000 |   7 | out    
2018-06-03 18:30:00.000 |   8 | in     
2018-06-03 19:00:00.000 |   5 | out

来源：https://stackoverflow.com/questions/64491255/partial-sum-between-different-records-using-sql-2008

标签

sql

sql-server

sql-server-2008

sum

gaps-and-islands