问题
I tried to use mapReduce for my collection. Just for debug I returned vals
value passed as second argument do reduce
function, like this:
db.runCommand({
"mapreduce":"MyCollection",
"map":function() {
emit( {
country_code:this.cc,
partner:this.di,
registeredPeriod:Math.floor((this.ca - 1399240800)/604800)
},
{
count:Math.ceil((this.lla - this.ca)/86400)
});
},
"reduce":function(k, vals) {
return {
'count':vals
};
},
"query":{
"ca":{
"$gte":1399240800
},
"di":405,
"cc":"1"
},
"out":{
"inline":true
}
});
And I got result like this:
{
"results" : [
{
"_id" : {
"country_code" : "1",
"distribution" : 405,
"installationPeriod" : 0
},
"value" : {
"count" : [
{
"count" : 37
},
{
"count" : 38
}
]
}
},
{
"_id" : {
"country_code" : "1",
"distribution" : 405,
"installationPeriod" : 1
},
"value" : {
"count" : 36
}
},
{
"_id" : {
"country_code" : "1",
"distribution" : 405,
"installationPeriod" : 4
},
"value" : {
"count" : [
{
"count" : [
{
"count" : 16
},
{
"count" : 16
}
]
},
{
"count" : 15
}
]
}
}
],
"timeMillis" : 38,
"counts" : {
"input" : 130,
"emit" : 130,
"reduce" : 5,
"output" : 6
},
"ok" : 1
}
I really don't know why I got multidimensional array as second argument for my reduce
function. I mean about this part of result:
{
"_id" : {
"country_code" : "1",
"distribution" : 405,
"installationPeriod" : 4
},
"value" : {
"count" : [
{
"count" : [ // <= Why is this multidimensional?
{
"count" : 16
}
Why is this multidimensional? And why key of embedded array is same like returned from reduce
function?
回答1:
The reason is because this is mapReduce works. From the documentation point:
MongoDB can invoke the reduce function more than once for the same key. In this case, the previous output from the reduce function for that key will become one of the input values to the next reduce function invocation for that key.
And a later point:
the type of the return object must be identical to the type of the value emitted by the map function to ensure that the following operations is true:
So even though you have not "changed the signature" as that documentation points to, you are still only processing n
items at once in one reduce pass and then another n
items in the next pass. What happens in the eventual processing of this is that the array that was returned in one fragment is combined with the array from another fragment.
So what happened is your reduce returns an array, but it is not "all" of the items you emitted for the key, just some of them. Then another reduce on the same "key" processes more items. Finally those two arrays (or probably more) are again sent to the reduce, in an attempt to actually "reduce" those items as is intended.
That is the general concept, so it is no surprise that when you are just pushing back the array then that is what you get.
Short version, mapReduce processes the ouput "keys" in chunks and not all at once. Better to learn that now before it becomes a problem for you later.
来源:https://stackoverflow.com/questions/24283502/mongodb-mapreduce-second-argument-of-reduce-function-is-multidimensional-array