SQL Select Statement For Calculating A Running Average Column

前端 未结 7 860
-上瘾入骨i
-上瘾入骨i 2020-12-11 04:53

I am trying to have a running average column in the SELECT statement based on a column from the n previous rows in the same SELECT statement. The average I need is based on

相关标签:
7条回答
  • This should do it:

    --Test Data
    CREATE TABLE    RowsToAverage
        (
        ID int NOT NULL,
        Number int NOT NULL
        )
    
    INSERT  RowsToAverage(ID, Number)
    SELECT  1, 1
    UNION ALL
    SELECT  2, 3
    UNION ALL
    SELECT  3, 2
    UNION ALL
    SELECT  4, 4
    UNION ALL
    SELECT  5, 6
    UNION ALL
    SELECT  6, 8
    UNION ALL
    SELECT  7, 10
    
    --The query
    ;WITH   NumberedRows
    AS
    (
    SELECT  rta.*, row_number() OVER (ORDER BY rta.ID ASC) AS RowNumber
    FROM    RowsToAverage rta
    )
    
    SELECT  nr.ID, nr.Number,
            CASE
                WHEN nr.RowNumber <=3 THEN NULL
                ELSE (  SELECT  avg(Number) 
                        FROM    NumberedRows 
                        WHERE   RowNumber < nr.RowNumber
                        AND     RowNumber >= nr.RowNumber - 3
                    )
            END AS MovingAverage
    FROM    NumberedRows nr
    
    0 讨论(0)
  • 2020-12-11 05:33

    Alternatively you can denormalize and store precalculated running values. Described here:

    http://sqlblog.com/blogs/alexander_kuznetsov/archive/2009/01/23/denormalizing-to-enforce-business-rules-running-totals.aspx

    Performance of selects is as fast as it goes. Of course, modifications are slower.

    0 讨论(0)
  • 2020-12-11 05:36

    Check out some solutions here. I'm sure that you could adapt one of them easily enough.

    0 讨论(0)
  • 2020-12-11 05:37

    If you want this to be truly performant, and arn't afraid to dig into a seldom-used area of SQL Server, you should look into writing a custom aggregate function. SQL Server 2005 and 2008 brought CLR integration to the table, including the ability to write user aggregate functions. A custom running total aggregate would be the most efficient way to calculate a running average like this, by far.

    0 讨论(0)
  • 2020-12-11 05:41

    Assuming that the Id column is sequential, here's a simplified query for a table named "MyTable":

    SELECT 
        b.Id,
        b.Number,
        (
          SELECT 
           AVG(a.Number) 
          FROM 
           MyTable a 
         WHERE 
           a.id >= (b.Id - 3) 
           AND a.id < b.Id
           AND b.Id > 3 
         ) as Average
    FROM 
        MyTable b;
    
    0 讨论(0)
  • 2020-12-11 05:42

    A simple self join would seem to perform much better than a row referencing subquery

    Generate 10k rows of test data:

    drop table test10k
    create table test10k (Id int, Number int, constraint test10k_cpk primary key clustered (id))
    
    ;WITH digits AS (
        SELECT 0 as Number
        UNION SELECT 1
        UNION SELECT 2
        UNION SELECT 3
        UNION SELECT 4
        UNION SELECT 5
        UNION SELECT 6
        UNION SELECT 7
        UNION SELECT 8
        UNION SELECT 9
    )
    ,numbers as (
        SELECT 
            (thousands.Number * 1000) 
            + (hundreds.Number * 100) 
            + (tens.Number * 10) 
            + ones.Number AS Number
        FROM digits AS ones 
        CROSS JOIN digits AS tens
        CROSS JOIN digits AS hundreds
        CROSS JOIN digits AS thousands
    )
    insert test10k (Id, Number)
    select Number, Number
    from numbers 
    

    I would pull the special case of the first 3 rows out of the main query, you can UNION ALL those back in if you really want it in the row set. Self join query:

    ;WITH   NumberedRows
    AS
    (
        SELECT  rta.*, row_number() OVER (ORDER BY rta.ID ASC) AS RowNumber
        FROM    test10k rta
    )
    
    SELECT  nr.ID, nr.Number,
        avg(trailing.Number) as MovingAverage
    FROM    NumberedRows nr
        join NumberedRows as trailing on trailing.RowNumber between nr.RowNumber-3 and nr.RowNumber-1
    where nr.Number > 3
    group by nr.id, nr.Number
    

    On my machine this takes about 10 seconds, the subquery approach that Aaron Alton demonstrated takes about 45 seconds (after I changed it to reflect my test source table) :

    ;WITH   NumberedRows
    AS
    (
        SELECT  rta.*, row_number() OVER (ORDER BY rta.ID ASC) AS RowNumber
        FROM    test10k rta
    )
    SELECT  nr.ID, nr.Number,
        CASE
                WHEN nr.RowNumber <=3 THEN NULL
                ELSE (  SELECT  avg(Number) 
                                FROM    NumberedRows 
                                WHERE   RowNumber < nr.RowNumber
                                AND             RowNumber >= nr.RowNumber - 3
                        )
        END AS MovingAverage
    FROM    NumberedRows nr
    

    If you do a SET STATISTICS PROFILE ON, you can see the self join has 10k executes on the table spool. The subquery has 10k executes on the filter, aggregate, and other steps.

    0 讨论(0)
提交回复
热议问题