3 Month Moving Average - Redshift SQL

问题

I am trying to create a 3 Month Moving Average based on some data that I have while using RedShift SQL or Domo BeastMode (if anyone is familiar with that).

The data is on a day to day basis, but needs to be displayed by month. So the quotes/revenue need to be summarized by month, and then a 3MMA needs to be calculated (excluding the current month).

So, if the quote was in April, I would need the average of Jan, Feb, Mar.

The input data looks like this:

Quote Date MM/DD/YYYY     Revenue
3/24/2015                 61214
8/4/2015                  22983
9/3/2015                  30000
9/15/2015                 171300
9/30/2015                 112000

And I need the output to look something like this:

Month               Revenue             3MMA
Jan 2015            =Sum of Jan Rev     =(Oct14 + Nov14 + Dec14) / 3
Feb 2015            =Sum of Feb Rev     =(Nov14 + Dec14 + Jan15) / 3
March 2015          =Sum of Mar Rev     =(Dec14 + Jan15 + Feb15) / 3
April 2015          =Sum of Apr Rev     =(Jan15 + Feb15 + Mar15) / 3
May 2015            =Sum of May Rev     =(Feb15 + Mar15 + Apr15) / 3

If anyone is able to help, I would be extremely grateful! I have been stuck on this for quite a while and have no idea what I'm doing when it comes to SQL lol.

Cheers, Logan.

回答1:

You can do this using aggregation and window functions:

select date_trunc('month', quotedate) as mon,
       sum(revenue) as mon_revenue,
       avg(sum(revenue)) over (order by date_trunc('month', quotedate)  rows between 2 preceding and current row) as revenue_3mon
from t
group by date_trunc('month', quotedate) 
order by mon;

Note: this uses average, so for the first and second row, it will divide by 1 and 2 respectively. It also assumes that you have at least one record for each month.

EDIT:

I wonder if there is an issue with aggregation functions mixed with analytic functions in RedShift. Is the following any better:

select m.*,
       avg(mon_revenue) over (order by mon rows between 2 preceding and current row) as revenue_3mon
from (select date_trunc('month', quotedate) as mon,
             sum(revenue) as mon_revenue
      from t
      group by date_trunc('month', quotedate) 
     ) m
order by mon;

回答2:

you could do something like the way we create buckets for a rolling 6 weeks (the date column is called "date"):

case 
    when date_trunc('week',dateadd(day,1,date)) = date_trunc('week',dateadd(day,1,current_date)) then 'CW'
    when date_trunc('week',dateadd(day,1,date)) = date_trunc('week',dateadd(day,-6,current_date)) then 'LW'
    when date_trunc('week',dateadd(day,1,date)) = date_trunc('week',dateadd(day,-13,current_date)) then '2W'
    when date_trunc('week',dateadd(day,1,date)) = date_trunc('week',dateadd(day,-20,current_date)) then '3W'
    when date_trunc('week',dateadd(day,1,date)) = date_trunc('week',dateadd(day,-27,current_date)) then '4W'
    when date_trunc('week',dateadd(day,1,date)) = date_trunc('week',dateadd(day,-34,current_date)) then '5W'
    when date_trunc('week',dateadd(day,1,date)) = date_trunc('week',dateadd(day,-41,current_date)) then '6W'  
end as dateweek

You could then create an average in a subsequent step in the dataflow...

回答3:

You cannot use aggregate functions and analytic function together the query should be

select m.*,
       avg(mon_revenue) over (order by mon rows between 3 preceding and 1 preceding) as revenue_3mon -- using 3 preceding and 1 preceding row you exclude the current row
from (select date_trunc('month', quotedate) as mon,
             sum(revenue) as mon_revenue
      from t
      group by date_trunc('month', quotedate) 
     ) m
order by mon;

rows between 3 preceding and 1 preceding (should remove the row in the end, otherwise redshift won't work)

来源：https://stackoverflow.com/questions/36120327/3-month-moving-average-redshift-sql

标签

sql

amazon-redshift

domo