Moving averages with MongoDB's aggregation framework?

后端 未结 5 1710
生来不讨喜
生来不讨喜 2020-12-04 01:28

If you have 50 years of temperature weather data (daily) (for example) how would you calculate moving averages, using 3-month intervals, for that time period? Can you do tha

5条回答
  •  青春惊慌失措
    2020-12-04 02:25

    The agg framework now has $map and $reduce and $range built in so array processing is much more straightfoward. Below is an example of calculating moving average on a set of data where you wish to filter by some predicate. The basic setup is each doc contains filterable criteria and a value, e.g.

    {sym: "A", d: ISODate("2018-01-01"), val: 10}
    {sym: "A", d: ISODate("2018-01-02"), val: 30}
    

    Here it is:

    // This controls the number of observations in the moving average:
    days = 4;
    
    c=db.foo.aggregate([
    
    // Filter down to what you want.  This can be anything or nothing at all.
    {$match: {"sym": "S1"}}
    
    // Ensure dates are going earliest to latest:
    ,{$sort: {d:1}}
    
    // Turn docs into a single doc with a big vector of observations, e.g.
    //     {sym: "A", d: d1, val: 10}
    //     {sym: "A", d: d2, val: 11}
    //     {sym: "A", d: d3, val: 13}
    // becomes
    //     {_id: "A", prx: [ {v:10,d:d1}, {v:11,d:d2},  {v:13,d:d3} ] }
    //
    // This will set us up to take advantage of array processing functions!
    ,{$group: {_id: "$sym", prx: {$push: {v:"$val",d:"$date"}} }}
    
    // Nice additional info.  Note use of dot notation on array to get
    // just scalar date at elem 0, not the object {v:val,d:date}:
    ,{$addFields: {numDays: days, startDate: {$arrayElemAt: [ "$prx.d", 0 ]}} }
    
    // The Juice!  Assume we have a variable "days" which is the desired number
    // of days of moving average.
    // The complex expression below does this in python pseudocode:
    //
    // for z in range(0, size of value vector - # of days in moving avg):
    //    seg = vector[n:n+days]
    //    values = seg.v
    //    dates = seg.d
    //    for v in seg:
    //        tot += v
    //    avg = tot/len(seg)
    // 
    // Note that it is possible to overrun the segment at the end of the "walk"
    // along the vector, i.e. not enough date-values.  So we only run the
    // vector to (len(vector) - (days-1).
    // Also, for extra info, we also add the number of days *actually* used in the
    // calculation AND the as-of date which is the tail date of the segment!
    //
    // Again we take advantage of dot notation to turn the vector of
    // object {v:val, d:date} into two vectors of simple scalars [v1,v2,...]
    // and [d1,d2,...] with $prx.v and $prx.d
    //
    ,{$addFields: {"prx": {$map: {
        input: {$range:[0,{$subtract:[{$size:"$prx"}, (days-1)]}]} ,
        as: "z",
        in: {
           avg: {$avg: {$slice: [ "$prx.v", "$$z", days ] } },
           d: {$arrayElemAt: [ "$prx.d", {$add: ["$$z", (days-1)] } ]}
            }
            }}
        }}
    
                ]);
    

    This might produce the following output:

    {
        "_id" : "S1",
        "prx" : [
            {
                "avg" : 11.738793632512115,
                "d" : ISODate("2018-09-05T16:10:30.259Z")
            },
            {
                "avg" : 12.420766702631376,
                "d" : ISODate("2018-09-06T16:10:30.259Z")
            },
            ...
    
        ],
        "numDays" : 4,
        "startDate" : ISODate("2018-09-02T16:10:30.259Z")
    }
    

提交回复
热议问题