Grouping documents in pairs using mongo aggregation

你。 提交于 2019-12-06 15:57:45

This is something that just cannot be done with the aggregation framework, and the only current MongoDB method available for this type of operation is mapReduce.

The reason being that the a aggregation framework has no way of referring to any other document in the pipeline than the present one. This actually applies to "grouping" pipeline stages as well, since even though things are grouped on a "key" you cant really deal with individual documents in the way you want to.

MapReduce on the other hand has one feature available that allows you to do what you want here, and it's not even "directly" related to aggregation. It is in fact the ability to have "globally scoped variables" across all stages. And having a "variable" to basically "store the last document" is all you need to achieve your result.

So it's quite simple code, and there is in fact no "reduction" required:

db.collection.mapReduce(
    function () {
      if (lastVal != null)
        emit( this._id, this.val - lastVal );
      lastVal = this.val;
    },
    function() {}, // mapper is not called
    {
        "scope": { "lastVal": null },
        "out": { "inline": 1 }
    }
)

Which gives you a result much like this:

{
    "results" : [
            {
                    "_id" : ObjectId("54a425a99b8bcd6f73e2d662"),
                    "value" : 2
            },
            {
                    "_id" : ObjectId("54a425a99b8bcd6f73e2d663"),
                    "value" : 3
            },
            {
                    "_id" : ObjectId("54a425a99b8bcd6f73e2d664"),
                    "value" : 4
            }
    ],
    "timeMillis" : 3,
    "counts" : {
            "input" : 4,
            "emit" : 3,
            "reduce" : 0,
            "output" : 3
    },
    "ok" : 1
}

That's really just picking "something unique" as the emitted _id value rather than anything specific, because all this is really doing is the difference between values on differing documents.

Global variables are usually the solution to these types of "pairing" aggregations or producing "running totals". Right now the aggregation framework has no access to global variables, even though it might well be a nice this to have. The mapReduce framework has them, so it is probably fair to say that they should be available to the aggregation framework as well.

Right now they are not though, so stick with mapReduce.

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!