MongoDB MapReduce, second argument of reduce function is multidimensional array

为君一笑 提交于 2019-12-11 18:57:02

问题


I tried to use mapReduce for my collection. Just for debug I returned vals value passed as second argument do reduce function, like this:

db.runCommand({
 "mapreduce":"MyCollection",
 "map":function() {
    emit( {
       country_code:this.cc,
       partner:this.di,
       registeredPeriod:Math.floor((this.ca - 1399240800)/604800)
    },
    {
       count:Math.ceil((this.lla - this.ca)/86400)
    });
 },
 "reduce":function(k, vals) {
    return {
       'count':vals
    }; 
 },
 "query":{
    "ca":{
       "$gte":1399240800
    },
    "di":405,
    "cc":"1"
 },
 "out":{
    "inline":true
 }
});

And I got result like this:

{
"results" : [
    {
        "_id" : {
            "country_code" : "1",
            "distribution" : 405,
            "installationPeriod" : 0
        },
        "value" : {
            "count" : [
                {
                    "count" : 37
                },
                {
                    "count" : 38
                }
            ]
        }
    },
    {
        "_id" : {
            "country_code" : "1",
            "distribution" : 405,
            "installationPeriod" : 1
        },
        "value" : {
            "count" : 36
        }
    },
    {
        "_id" : {
            "country_code" : "1",
            "distribution" : 405,
            "installationPeriod" : 4
        },
        "value" : {
            "count" : [
                {
                    "count" : [
                        {
                            "count" : 16
                        },
                        {
                            "count" : 16
                        }
                    ]
                },
                {
                    "count" : 15
                }
            ]
        }
    }
],
"timeMillis" : 38,
"counts" : {
    "input" : 130,
    "emit" : 130,
    "reduce" : 5,
    "output" : 6
},
"ok" : 1
}

I really don't know why I got multidimensional array as second argument for my reduce function. I mean about this part of result:

        {
        "_id" : {
            "country_code" : "1",
            "distribution" : 405,
            "installationPeriod" : 4
        },
        "value" : {
            "count" : [
                {
                    "count" : [ // <= Why is this multidimensional?
                        {
                            "count" : 16
                        }

Why is this multidimensional? And why key of embedded array is same like returned from reduce function?


回答1:


The reason is because this is mapReduce works. From the documentation point:

MongoDB can invoke the reduce function more than once for the same key. In this case, the previous output from the reduce function for that key will become one of the input values to the next reduce function invocation for that key.

And a later point:

the type of the return object must be identical to the type of the value emitted by the map function to ensure that the following operations is true:

So even though you have not "changed the signature" as that documentation points to, you are still only processing n items at once in one reduce pass and then another n items in the next pass. What happens in the eventual processing of this is that the array that was returned in one fragment is combined with the array from another fragment.

So what happened is your reduce returns an array, but it is not "all" of the items you emitted for the key, just some of them. Then another reduce on the same "key" processes more items. Finally those two arrays (or probably more) are again sent to the reduce, in an attempt to actually "reduce" those items as is intended.

That is the general concept, so it is no surprise that when you are just pushing back the array then that is what you get.

Short version, mapReduce processes the ouput "keys" in chunks and not all at once. Better to learn that now before it becomes a problem for you later.



来源:https://stackoverflow.com/questions/24283502/mongodb-mapreduce-second-argument-of-reduce-function-is-multidimensional-array

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!