MongoDB MapReduce - Emit one key/one value doesnt call reduce

﹥>﹥吖頭↗ 提交于 2021-02-05 18:54:49

问题


So i'm new with mongodb and mapreduce in general and came across this "quirk" (or atleast in my mind a quirk)

Say I have objects in my collection like so:

{'key':5, 'value':5}

{'key':5, 'value':4}

{'key':5, 'value':1}

{'key':4, 'value':6}

{'key':4, 'value':4}

{'key':3, 'value':0}

My map function simply emits the key and the value

My reduce function simply adds the values AND before returning them adds 1 (I did this to check to see if the reduce function is even called)

My results follow:

{'_id': 3, 'value': 0}

{'_id':4, 'value': 11.0}

{'_id':5, 'value': 11.0}

As you can see, for the keys 4 & 5 I get the expected answer of 11 BUT for the key 3 (with only one entry in the collection with that key) I get the unexpected 0!

Is this natural behavior of mapreduce in general? For MongoDB? For pymongo (which I am using)?


回答1:


The reduce function combines documents with the same key into one document. If the map function emits a single document for a particular key (as is the case with key 3), the reduce function will not be called.




回答2:


I realize this is an older question, but I came to it and felt like I still didn't understand why this behavior exists and how to build map/reduce functionality so it's a non-issue.

The reason MongoDB doesn't call the reduce function if there is a single instance of a key is because it isn't necessary (I hope this will make more sense in a moment). The following are requirements for reduce functions:

  • The reduce function must return an object whose type must be identical to the type of the value emitted by the map function.
  • The order of the elements in the valuesArray should not affect the output of the reduce function
  • The reduce function must be idempotent.

The first requirement is very important and it seems a number of people are overlooking it because I've seen a number of people mapping in the reduce function then dealing with the single-key case in the finalize function. This is the wrong way to address the issue, however.

Think about it like this: If there's only a single instance of a key, a simple optimization is to skip the reducer entirely (there's nothing to reduce). Single-key values are still included in the output, but the intent of the reducer is to build an aggregate result of the multi-key documents in your collection. If the mapper and reducer are outputting the same type, you should be blissfully unaware by looking at the object structure of the output from your map/reduce functions. You shouldn't have to use a finalize function to correct the structure of your objects that didn't run through the reducer.

In short, do your mapping in your map function and reduce multi-key values into a single, aggregate result in your reduce functions.




回答3:


Solution:

  • added new field in map: single: 0
  • in reduce change this field to: single: 1
  • in finalize make checking for this field and make required actions

    $map = new MongoCode("function() {
        var value = {
            time: this.time,
            email_id: this.email_id,
            single: 0
        };
    
        emit(this.email, value);
    }");
    
    $reduce = new MongoCode("function(k, vals) {
    
        // make some need actions here
        return {
            time: vals[0].time,
            email_id: vals[0].email_id,
            single: 1
        };
    }");
    
    $finalize = new MongoCode("function(key, reducedVal) {
        if (reducedVal.single == 0) {
            reducedVal.time = 11111;
        }
        return reducedVal;
    };");
    



回答4:


"MongoDB will not call the reduce function for a key that has only a single value. The values argument is an array whose elements are the value objects that are “mapped” to the key."

http://docs.mongodb.org/manual/reference/command/mapReduce/#mapreduce-reduce-cmd




回答5:


Is this natural behavior of mapreduce in general?

Yes.



来源:https://stackoverflow.com/questions/11021733/mongodb-mapreduce-emit-one-key-one-value-doesnt-call-reduce

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!