Window moving average in sql server

╄→尐↘猪︶ㄣ 提交于 2019-12-08 05:06:26

问题


I am trying to create a function that computes a windowed moving average in SQLServer 2008. I am quite new to SQL so I am having a fair bit of difficulty. The data that I am trying to perform the moving average on needs to be grouped by day (it is all timestamped data) and then a variable moving average window needs to be applied to it.

I already have a function that groups the data by day (and @id) which is shown at the bottom. I have a few questions:

Would it be better to call the grouping function inside the moving average function or should I do it all at once?

Is it possible to get the moving average for the dates input into the function, but go back n days to begin the moving average so that the first n days of the returned data will not have 0 for their average? (ie. if they want a 7 day moving average from 01-08-2011 to 02-08-2011 that I start the moving average calculation on 01-01-2011 so that the first day they defined has a value?)

I am in the process of looking into how to do the moving average, and know that a moving window seems to be the best option (currentSum = prevSum + todayCount - nthDayAgoCount) / nDays but I am still working on figuring out the SQL implementation of this.

I have a grouping function that looks like this (some variables removed for visibility purposes):

    SELECT
        'ALL' as GeogType,
        CAST(v.AdmissionOn as date) as dtAdmission,    
        CASE WHEN @id IS NULL THEN 99 ELSE v.ID END,
        COUNT(*) as nVisits
    FROM dbo.Table1 v INNER JOIN dbo.Table2 t ON v.FSLDU = t.FSLDU5
    WHERE v.AdmissionOn >= '01-01-2010' AND v.AdmissionOn < DATEADD(day,1,'02-01-2010') 
          AND v.ID = Coalesce(@id,ID)
    GROUP BY    
        CAST(v.AdmissionOn as date),
        CASE WHEN @id IS NULL THEN 99 ELSE v.ID END
    ORDER BY 2,3,4

Which returns a table like so:

ALL 2010-01-01  1   103
ALL 2010-01-02  1   114
ALL 2010-01-03  1   86
ALL 2010-01-04  1   88
ALL 2010-01-05  1   84
ALL 2010-01-06  1   87
ALL 2010-01-07  1   82

EDIT: To answer the first question I asked:

I ended up creating a function which declared a temporary table and inserted the results from the count function into it, then used the example from user662852 to compute the moving average.


回答1:


Take the hardcoded date range out of your query. Write the output (like your sample at the end) to a temp table (I called it #visits below).
Try this self join to the temp table:

 Select list.dtadmission
   , AVG(data.nvisits) as Avg
   , SUM(data.nvisits) as sum
   , COUNT(data.nvisits) as RollingDayCount
   , MIN(data.dtadmission) as Verifymindate
   , MAX(data.dtadmission)   as Verifymaxdate
 from  #visits as list 
 inner join #visits as data  
 on list.dtadmission between data.dtadmission and DATEADD(DD,6,data.dtadmission) group by list.dtadmission

EDIT: I didn't have enough room in Comments to say this in response to your question:

My join is "kinda cartesian" because it uses a between in the join constraint. Each record in list is going up against every other record, and then I want the ones where the date I report is between a lower bound of (-7) days and today. Every data date is available to list date, this is the key to your question. I could have written the join condition as

list.dtadmission between DATEADD(DD,-6,data.dtadmission) and data.dtadmission

But what really happened was I tested it as

list.dtadmission between DATEADD(DD,6,data.dtadmission) and data.dtadmission

Which returns no records because the syntax is "Between LOW and HIGH". I facepalmed on 0 records and swapped the arguments, that's all.

Try the following, see what I mean: This is the cartesian join for just one listdate:

 SELECT 
 list.[dtAdmission] as listdate
 ,data.[dtAdmission] as datadate
 ,data.nVisits as datadata
 ,DATEADD(dd,6,list.dtadmission) as listplus6 
 ,DATEADD(dd,6,data.dtAdmission ) as datapplus6 
 from  [sandbox].[dbo].[admAvg] as list inner join [sandbox].[dbo].[admAvg] as data    
 on 
 1=1
 where list.dtAdmission = '5-Jan-2011'

Compare this to the actual join condition

 SELECT 
      list.[dtAdmission] as listdate
      ,data.[dtAdmission] as datadate
      ,data.nVisits as datadata
      ,DATEADD(dd,6,list.dtadmission) as listplus6 
      ,DATEADD(dd,6,data.dtAdmission ) as datapplus6
from  [sandbox].[dbo].[admAvg] as list   inner join [sandbox].[dbo].[admAvg] as data    
on 
list.dtadmission between data.dtadmission and DATEADD(DD,6,data.dtadmission)
where list.dtAdmission = '5-Jan-2011'

See how list date is between datadate and dataplus6 in all the records?



来源:https://stackoverflow.com/questions/5407273/window-moving-average-in-sql-server

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!