Reduce function on Map Reduce showing incorrect results — why?

元气小坏坏 提交于 2019-12-24 02:18:09

问题


I have a data structure that keeps track of people in different cities:

//in db.persons
{
  name: "John",
  city: "Seattle
},
{
  name: "Bill",
  city: "Portland"
}

I want to run a map reduce to get a list of how many people are in each city, so the result will look like this:

{
  _id: "Seattle",
  value: 10
}

My map reduce function looks like this:

map = function(){
  var city = this.city
  emit(city, 1);
};


reduce = function(key, values){
    var result = 0;
    values.forEach(function(value){
      result += 1;
    });
    return result;
}

Very simple stuff, I figured it would take the city as a key, then add one to the result for each matching city it found. However, on the resulting map reduce, the value was off by a large factor. Switching my reduce function to:

reduce = function(key, values){
    var result = 0;
    values.forEach(function(value){
      result += value;
    });
    return result;
}

And adding the value to the result (which should be 1, as I understand it from my emit function) returned correct results.

Why are the results different? Wouldn't my value be 1 in the reduce function?


回答1:


This happens because MongoDB can invoke the reduce function multiple times for the same key. Here's a simple worked example:

Lets say you have just three documents in your database, each with same 'city' of 'Seattle'. After the emit phase, you will have a set of emitted objects which look like

{'Seattle' : 1}. {'Seattle' : 1}. {'Seattle' : 1}

After the emit phase has completed, the reduce phase starts. In the simplest case, the reduce function will be called as reduce('Seattle', [1,1,1]). In this case, your first function would work correctly. However, the reduce function may be called multiple times:

reduce('Seattle', [1,1]) -> {'Seattle' : 2}, {'Seattle', 1}

reduce('Seattle', [2,1])

In this case, your first reduce function would return 2 after the second reduce call as there are two items in the list of values. In your second reduce function, you correctly add the values together rather than just counting them, which gives the correct answer.

I personally think that the CouchDB docs explain this slightly better as to why you need to have commutative and associative reduce functions for their array of values input.



来源:https://stackoverflow.com/questions/17871997/reduce-function-on-map-reduce-showing-incorrect-results-why

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!