SQL Select Statement For Calculating A Running Average Column

前端 未结 7 883
-上瘾入骨i
-上瘾入骨i 2020-12-11 04:53

I am trying to have a running average column in the SELECT statement based on a column from the n previous rows in the same SELECT statement. The average I need is based on

7条回答
  •  爱一瞬间的悲伤
    2020-12-11 05:42

    A simple self join would seem to perform much better than a row referencing subquery

    Generate 10k rows of test data:

    drop table test10k
    create table test10k (Id int, Number int, constraint test10k_cpk primary key clustered (id))
    
    ;WITH digits AS (
        SELECT 0 as Number
        UNION SELECT 1
        UNION SELECT 2
        UNION SELECT 3
        UNION SELECT 4
        UNION SELECT 5
        UNION SELECT 6
        UNION SELECT 7
        UNION SELECT 8
        UNION SELECT 9
    )
    ,numbers as (
        SELECT 
            (thousands.Number * 1000) 
            + (hundreds.Number * 100) 
            + (tens.Number * 10) 
            + ones.Number AS Number
        FROM digits AS ones 
        CROSS JOIN digits AS tens
        CROSS JOIN digits AS hundreds
        CROSS JOIN digits AS thousands
    )
    insert test10k (Id, Number)
    select Number, Number
    from numbers 
    

    I would pull the special case of the first 3 rows out of the main query, you can UNION ALL those back in if you really want it in the row set. Self join query:

    ;WITH   NumberedRows
    AS
    (
        SELECT  rta.*, row_number() OVER (ORDER BY rta.ID ASC) AS RowNumber
        FROM    test10k rta
    )
    
    SELECT  nr.ID, nr.Number,
        avg(trailing.Number) as MovingAverage
    FROM    NumberedRows nr
        join NumberedRows as trailing on trailing.RowNumber between nr.RowNumber-3 and nr.RowNumber-1
    where nr.Number > 3
    group by nr.id, nr.Number
    

    On my machine this takes about 10 seconds, the subquery approach that Aaron Alton demonstrated takes about 45 seconds (after I changed it to reflect my test source table) :

    ;WITH   NumberedRows
    AS
    (
        SELECT  rta.*, row_number() OVER (ORDER BY rta.ID ASC) AS RowNumber
        FROM    test10k rta
    )
    SELECT  nr.ID, nr.Number,
        CASE
                WHEN nr.RowNumber <=3 THEN NULL
                ELSE (  SELECT  avg(Number) 
                                FROM    NumberedRows 
                                WHERE   RowNumber < nr.RowNumber
                                AND             RowNumber >= nr.RowNumber - 3
                        )
        END AS MovingAverage
    FROM    NumberedRows nr
    

    If you do a SET STATISTICS PROFILE ON, you can see the self join has 10k executes on the table spool. The subquery has 10k executes on the filter, aggregate, and other steps.

提交回复
热议问题