Bigquery moving average

后端 未结 3 574
遥遥无期
遥遥无期 2021-01-01 07:30

Can anybody tell me how to calculate moving average in bigquery.

Here is what I need in mysql style.

SELECT T1.id, T1.value_column1, avg(T2.value_col         


        
相关标签:
3条回答
  • 2021-01-01 07:45

    I have created the following "Times" table:

    Table Details: Dim_Periods
    Schema
    Date    TIMESTAMP   
    Year    INTEGER         
    Month   INTEGER         
    day         INTEGER         
    QUARTER INTEGER     
    DAYOFWEEK   INTEGER     
    MonthStart  TIMESTAMP   
    MonthEnd    TIMESTAMP   
    WeekStart   TIMESTAMP   
    WeekEnd TIMESTAMP   
    Back30Days  TIMESTAMP   -- the date 30 days before "Date"
    Back7Days   TIMESTAMP   -- the date 7 days before "Date"
    

    and I use such query to handle "running sums"

    SELECT Date,Count(*) as MovingCNT
    FROM
    
    (SELECT Date,
                    Back7Days 
                        FROM DWH.Dim_Periods  
                     where Date < timestamp(current_date()) AND
                                 Date >= (DATE_ADD (CURRENT_TIMESTAMP(), -5, 'month'))
                    )P
                    CROSS JOIN EACH
        (SELECT repository_url,repository_created_at
        FROM publicdata:samples.github_timeline
                    ) L
            WHERE timestamp(repository_created_at)>= Back7Days 
                  AND timestamp(repository_created_at)<= Date
    
    GROUP EACH BY Date
    

    Note that it can be used for "Month to date", Week to Date" "30 days back" etc. aggregations as well. However, performance is not the best and the query can take a while on larger data sets due to the Cartesian join. Hope this helps

    0 讨论(0)
  • 2021-01-01 07:51

    You can do the same thing, but since BigQuery only allows join on equality, it takes a little bit more work. Here is an example that will compute the 6 month moving average of birth-weights from the public natality sample.

    SELECT 
      --Convert months-since-year-0 back to year, month
      INTEGER(month / 12) as year, 
      month % 12 as month, 
      avg 
    FROM (
      SELECT month, 
        -- Note that this average is the average over all of the data in the 
        -- last 6 months, not an average over the avg values for the last 6 months.
        -- It is easy to compute the latter, if that is what is desired -- just 
        -- compute the average in the inner select, and take the average of those
        -- here.
         SUM(total_weight_per_month) / SUM(records_per_month) as avg
      FROM (
        SELECT 
           -- Note we use t2.month here since that is what is compared against
           -- 6 different t1 months.
           t2.month as month, 
           t1.records_per_month as records_per_month,     
           t1.total_weight_per_month as total_weight_per_month 
        FROM (
          SELECT month,
            COUNT( weight_pounds ) as records_per_month, 
            SUM( weight_pounds)  as total_weight_per_month,
            -- This active field is the key that lets us join all of the
            -- values against the values in the date subselect.
            1 AS active
          FROM (
            SELECT 
               -- Convert year and month fields to a single value that
               -- has the number of months since year 0. This will allow
               -- us to do math on the dates.
              year * 12 + month AS month, 
              weight_pounds 
            FROM [publicdata:samples.natality] 
            WHERE weight_pounds > 0)
          GROUP BY month) as t1
        JOIN  
          -- We join the weights per month agsint a subselect that contains
          -- all months.
          (SELECT month, 1 as active 
          FROM
            (SELECT 
               year * 12 + month AS month, 
            FROM [publicdata:samples.natality])
          GROUP BY month) as t2
        ON t1.active = t2.active 
        -- Here is where we get the moving average -- we basically take the month 
        -- value from t1 and make it apply for 6 months.
        WHERE t1.month >= t2.month && t1.month - 6 < t2.month)
      GROUP BY month
      ORDER BY month desc)
    
    0 讨论(0)
  • 2021-01-01 08:11

    For an updated and more efficient answer, https://stackoverflow.com/a/24943950/132438.


    Check the new LAG() and LEAD() window functions. They allow you to traverse the result set, without the need for a self join.

    https://developers.google.com/bigquery/docs/query-reference#windowfunctions

    A different option with JOIN EACH (this can get too slow as an extremely large amount of data can get generated in the intermediate steps):

    SELECT a.SensorId SensorId, a.Timestamp, AVG(b.Data) AS avg_prev_hour_load
    FROM (
      SELECT * FROM [io_sensor_data.moscone_io13]
      WHERE SensorId = 'XBee_40670EB0/mic') a
    JOIN EACH [io_sensor_data.moscone_io13] b
    ON a.SensorId = b.SensorId
    WHERE b.Timestamp BETWEEN (a.Timestamp - 3600000) AND a.Timestamp
    GROUP BY SensorId, a.Timestamp;
    

    (based on Joe Celko's SQL problems)

    0 讨论(0)
提交回复
热议问题