MongoDB Aggregation: Compute Running Totals from sum of previous rows

可紊 提交于 2019-12-29 04:03:45

问题


Sample Documents:

{
 _id: ObjectId('4f442120eb03305789000000'),
 time: ISODate("2013-10-10T20:55:36Z"),
 value:1
},
{
 _id: ObjectId('4f442120eb03305789000001'),
 time: ISODate("2013-10-10T28:43:16Z"),
 value:2
},
{
 _id: ObjectId('4f442120eb03305789000002'),
 time: ISODate("2013-10-11T27:12:66Z"),
 value:3
},
{
 _id: ObjectId('4f442120eb03305789000003'),
 time: ISODate("2013-10-11T10:15:38Z"),
 value:4
},
{
 _id: ObjectId('4f442120eb03305789000004'),
 time: ISODate("2013-10-12T26:15:38Z"),
 value:5
}

It's easy to get the aggregated results that is grouped by date. But what I want is to query results that returns a running total of the aggregation, like:

{
 time: "2013-10-10"
 total: 3,
 runningTotal: 3
},
{
 time: "2013-10-11"
 total: 7,
 runningTotal: 10 
},
{
 time: "2013-10-12"
 total: 5,
 runningTotal: 15
}

Is this possible with the MongoDB Aggregation?


回答1:


This does what you need. I have normalised the times in the data so they group together (You could do something like this). The idea is to $group and push the time's and total's into separate arrays. Then $unwind the time array, and you have made a copy of the totals array for each time document. You can then calculated the runningTotal (or something like the rolling average) from the array containing all the data for different times. The 'index' generated by $unwind is the array index for the total corresponding to that time. It is important to $sort before $unwinding since this ensures the arrays are in the correct order.

db.temp.aggregate(
    [
        {
            '$group': {
                '_id': '$time',
                'total': { '$sum': '$value' }
            }
        },
        {
            '$sort': {
                 '_id': 1
            }
        },
        {
            '$group': {
                '_id': 0,
                'time': { '$push': '$_id' },
                'totals': { '$push': '$total' }
            }
        },
        {
            '$unwind': {
                'path' : '$time',
                'includeArrayIndex' : 'index'
            }
        },
        {
            '$project': {
                '_id': 0,
                'time': { '$dateToString': { 'format': '%Y-%m-%d', 'date': '$time' }  },
                'total': { '$arrayElemAt': [ '$totals', '$index' ] },
                'runningTotal': { '$sum': { '$slice': [ '$totals', { '$add': [ '$index', 1 ] } ] } },
            }
        },
    ]
);

I have used something similar on a collection with ~80 000 documents, aggregating to 63 results. I am not sure how well it will work on larger collections, but I have found that performing transformations(projections, array manipulations) on aggregated data does not seem to have a large performance cost once the data is reduced to a manageable size.




回答2:


here is another approach

pipeline

db.col.aggregate([
    {$group : {
        _id : { time :{ $dateToString: {format: "%Y-%m-%d", date: "$time", timezone: "-05:00"}}},
        value : {$sum : "$value"}
    }},
    {$addFields : {_id : "$_id.time"}},
    {$sort : {_id : 1}},
    {$group : {_id : null, data : {$push : "$$ROOT"}}},
    {$addFields : {data : {
        $reduce : {
            input : "$data",
            initialValue : {total : 0, d : []},
            in : {
                total : {$sum : ["$$this.value", "$$value.total"]},                
                d : {$concatArrays : [
                        "$$value.d",
                        [{
                            _id : "$$this._id",
                            value : "$$this.value",
                            runningTotal : {$sum : ["$$value.total", "$$this.value"]}
                        }]
                ]}
            }
        }
    }}},
    {$unwind : "$data.d"},
    {$replaceRoot : {newRoot : "$data.d"}}
]).pretty()

collection

> db.col.find()
{ "_id" : ObjectId("4f442120eb03305789000000"), "time" : ISODate("2013-10-10T20:55:36Z"), "value" : 1 }
{ "_id" : ObjectId("4f442120eb03305789000001"), "time" : ISODate("2013-10-11T04:43:16Z"), "value" : 2 }
{ "_id" : ObjectId("4f442120eb03305789000002"), "time" : ISODate("2013-10-12T03:13:06Z"), "value" : 3 }
{ "_id" : ObjectId("4f442120eb03305789000003"), "time" : ISODate("2013-10-11T10:15:38Z"), "value" : 4 }
{ "_id" : ObjectId("4f442120eb03305789000004"), "time" : ISODate("2013-10-13T02:15:38Z"), "value" : 5 }

result

{ "_id" : "2013-10-10", "value" : 3, "runningTotal" : 3 }
{ "_id" : "2013-10-11", "value" : 7, "runningTotal" : 10 }
{ "_id" : "2013-10-12", "value" : 5, "runningTotal" : 15 }
> 



回答3:


Here is a solution without pushing previous documents into a new array and then processing them. (If the array gets too big then you can exceed the maximum BSON document size limit, the 16MB.)

Calculating running totals is as simple as:

db.collection1.aggregate(
[
  {
    $lookup: {
      from: 'collection1',
      let: { date_to: '$time' },
      pipeline: [
        {
          $match: {
            $expr: {
              $lt: [ '$time', '$$date_to' ]
            }
          }
        },
        {
          $group: {
            _id: null,
            summary: {
              $sum: '$value'
            }
          }
        }
      ],
      as: 'sum_prev_days'
    }
  },
  {
    $addFields: {
      sum_prev_days: {
        $arrayElemAt: [ '$sum_prev_days', 0 ]
      }
    }
  },
  {
    $addFields: {
      running_total: {
        $sum: [ '$value', '$sum_prev_days.summary' ]
      }
    }
  },
  {
    $project: { sum_prev_days: 0 }
  }
]
)

What we did: within the lookup we selected all documents with smaller datetime and immediately calculated the sum (using $group as the second step of lookup's pipeline). The $lookup put the value into the first element of an array. We pull the first array element and then calculate the sum: current value + sum of previous values.

If you would like to group transactions into days and after it calculate running totals then we need to insert $group to the beginning and also insert it into $lookup's pipeline.

db.collection1.aggregate(
[
  {
    $group: {
      _id: {
        $substrBytes: ['$time', 0, 10]
      },
      value: {
        $sum: '$value'
      }
    }
  },
  {
    $lookup: {
      from: 'collection1',
      let: { date_to: '$_id' },
      pipeline: [
        {
          $group: {
            _id: {
              $substrBytes: ['$time', 0, 10]
            },
            value: {
              $sum: '$value'
            }
          }
        },
        {
          $match: {
            $expr: {
              $lt: [ '$_id', '$$date_to' ]
            }
          }
        },
        {
          $group: {
            _id: null,
            summary: {
              $sum: '$value'
            }
          }
        }
      ],
      as: 'sum_prev_days'
    }
  },
  {
    $addFields: {
      sum_prev_days: {
        $arrayElemAt: [ '$sum_prev_days', 0 ]
      }
    }
  },
  {
    $addFields: {
      running_total: {
        $sum: [ '$value', '$sum_prev_days.summary' ]
      }
    }
  },
  {
    $project: { sum_prev_days: 0 }
  }
]
)

The result is:

{ "_id" : "2013-10-10", "value" : 3, "running_total" : 3 }
{ "_id" : "2013-10-11", "value" : 7, "running_total" : 10 }
{ "_id" : "2013-10-12", "value" : 5, "running_total" : 15 }


来源:https://stackoverflow.com/questions/16191125/mongodb-aggregation-compute-running-totals-from-sum-of-previous-rows

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!