问题
I am trying to create a function that computes a windowed moving average in SQLServer 2008. I am quite new to SQL so I am having a fair bit of difficulty. The data that I am trying to perform the moving average on needs to be grouped by day (it is all timestamped data) and then a variable moving average window needs to be applied to it.
I already have a function that groups the data by day (and @id) which is shown at the bottom. I have a few questions:
Would it be better to call the grouping function inside the moving average function or should I do it all at once?
Is it possible to get the moving average for the dates input into the function, but go back n days to begin the moving average so that the first n days of the returned data will not have 0 for their average? (ie. if they want a 7 day moving average from 01-08-2011 to 02-08-2011 that I start the moving average calculation on 01-01-2011 so that the first day they defined has a value?)
I am in the process of looking into how to do the moving average, and know that a moving window seems to be the best option (currentSum = prevSum + todayCount - nthDayAgoCount) / nDays but I am still working on figuring out the SQL implementation of this.
I have a grouping function that looks like this (some variables removed for visibility purposes):
SELECT
'ALL' as GeogType,
CAST(v.AdmissionOn as date) as dtAdmission,
CASE WHEN @id IS NULL THEN 99 ELSE v.ID END,
COUNT(*) as nVisits
FROM dbo.Table1 v INNER JOIN dbo.Table2 t ON v.FSLDU = t.FSLDU5
WHERE v.AdmissionOn >= '01-01-2010' AND v.AdmissionOn < DATEADD(day,1,'02-01-2010')
AND v.ID = Coalesce(@id,ID)
GROUP BY
CAST(v.AdmissionOn as date),
CASE WHEN @id IS NULL THEN 99 ELSE v.ID END
ORDER BY 2,3,4
Which returns a table like so:
ALL 2010-01-01 1 103
ALL 2010-01-02 1 114
ALL 2010-01-03 1 86
ALL 2010-01-04 1 88
ALL 2010-01-05 1 84
ALL 2010-01-06 1 87
ALL 2010-01-07 1 82
EDIT: To answer the first question I asked:
I ended up creating a function which declared a temporary table and inserted the results from the count function into it, then used the example from user662852
to compute the moving average.
回答1:
Take the hardcoded date range out of your query. Write the output (like your sample at the end) to a temp table (I called it #visits below).
Try this self join to the temp table:
Select list.dtadmission
, AVG(data.nvisits) as Avg
, SUM(data.nvisits) as sum
, COUNT(data.nvisits) as RollingDayCount
, MIN(data.dtadmission) as Verifymindate
, MAX(data.dtadmission) as Verifymaxdate
from #visits as list
inner join #visits as data
on list.dtadmission between data.dtadmission and DATEADD(DD,6,data.dtadmission) group by list.dtadmission
EDIT: I didn't have enough room in Comments to say this in response to your question:
My join is "kinda cartesian" because it uses a between in the join constraint. Each record in list is going up against every other record, and then I want the ones where the date I report is between a lower bound of (-7) days and today. Every data date is available to list date, this is the key to your question. I could have written the join condition as
list.dtadmission between DATEADD(DD,-6,data.dtadmission) and data.dtadmission
But what really happened was I tested it as
list.dtadmission between DATEADD(DD,6,data.dtadmission) and data.dtadmission
Which returns no records because the syntax is "Between LOW and HIGH". I facepalmed on 0 records and swapped the arguments, that's all.
Try the following, see what I mean: This is the cartesian join for just one listdate:
SELECT
list.[dtAdmission] as listdate
,data.[dtAdmission] as datadate
,data.nVisits as datadata
,DATEADD(dd,6,list.dtadmission) as listplus6
,DATEADD(dd,6,data.dtAdmission ) as datapplus6
from [sandbox].[dbo].[admAvg] as list inner join [sandbox].[dbo].[admAvg] as data
on
1=1
where list.dtAdmission = '5-Jan-2011'
Compare this to the actual join condition
SELECT
list.[dtAdmission] as listdate
,data.[dtAdmission] as datadate
,data.nVisits as datadata
,DATEADD(dd,6,list.dtadmission) as listplus6
,DATEADD(dd,6,data.dtAdmission ) as datapplus6
from [sandbox].[dbo].[admAvg] as list inner join [sandbox].[dbo].[admAvg] as data
on
list.dtadmission between data.dtadmission and DATEADD(DD,6,data.dtadmission)
where list.dtAdmission = '5-Jan-2011'
See how list date is between datadate and dataplus6 in all the records?
来源:https://stackoverflow.com/questions/5407273/window-moving-average-in-sql-server