问题
I have a data structure that keeps track of people in different cities:
//in db.persons
{
name: "John",
city: "Seattle
},
{
name: "Bill",
city: "Portland"
}
I want to run a map reduce to get a list of how many people are in each city, so the result will look like this:
{
_id: "Seattle",
value: 10
}
My map reduce function looks like this:
map = function(){
var city = this.city
emit(city, 1);
};
reduce = function(key, values){
var result = 0;
values.forEach(function(value){
result += 1;
});
return result;
}
Very simple stuff, I figured it would take the city
as a key, then add one to the result for each matching city it found. However, on the resulting map reduce, the value was off by a large factor. Switching my reduce function to:
reduce = function(key, values){
var result = 0;
values.forEach(function(value){
result += value;
});
return result;
}
And adding the value
to the result (which should be 1, as I understand it from my emit
function) returned correct results.
Why are the results different? Wouldn't my value
be 1 in the reduce function?
回答1:
This happens because MongoDB can invoke the reduce function multiple times for the same key. Here's a simple worked example:
Lets say you have just three documents in your database, each with same 'city' of 'Seattle'. After the emit phase, you will have a set of emitted objects which look like
{'Seattle' : 1}. {'Seattle' : 1}. {'Seattle' : 1}
After the emit phase has completed, the reduce phase starts. In the simplest case, the reduce function will be called as reduce('Seattle', [1,1,1])
. In this case, your first function would work correctly. However, the reduce function may be called multiple times:
reduce('Seattle', [1,1]) -> {'Seattle' : 2}, {'Seattle', 1}
reduce('Seattle', [2,1])
In this case, your first reduce function would return 2
after the second reduce call as there are two items in the list of values. In your second reduce function, you correctly add the values together rather than just counting them, which gives the correct answer.
I personally think that the CouchDB docs explain this slightly better as to why you need to have commutative and associative reduce functions for their array of values input.
来源:https://stackoverflow.com/questions/17871997/reduce-function-on-map-reduce-showing-incorrect-results-why