Reduce is called several times with the same key in mongodb map-reduce

十年热恋 提交于 2019-12-12 10:49:49

问题


I'm trying to run map reduce on mongodb in mongo shell. For some reason, in the reduce phase, I get several calls for the same key (instead of single one), so I get wrong results. I'm not an expert in this domains, so maybe I'm doing some stupid mistake. Any help appreciated.

Thanks.

This is my small example:

I'm creating 10000 documents:

var i = 0;
db.docs.drop();
while (i < 10000) {
    db.docs.insert({text:"line " + i,index:i});
    i++;
}

Then I'm doing map-reduce based on module 10 (so I except to get 1000 in each "bucket")

db.docs.mapReduce(
    function() { 
       emit(this.index%10,1);
    },
    function(key,values) {
       return values.length;
    },
    {
    out : {inline : 1}
    }
);

However, as results I get the following:

{
    "results" : [
        {
            "_id" : 0,
            "value" : 21
        },
        {
            "_id" : 1,
            "value" : 21
        },
        {
            "_id" : 2,
            "value" : 21
        },
        {
            "_id" : 3,
            "value" : 21
        },
        {
            "_id" : 4,
            "value" : 21
        },
        {
            "_id" : 5,
            "value" : 21
        },
        {
            "_id" : 6,
            "value" : 21
        },
        {
            "_id" : 7,
            "value" : 21
        },
        {
            "_id" : 8,
            "value" : 21
        },
        {
            "_id" : 9,
            "value" : 21
        }
    ],
    "timeMillis" : 76,
    "counts" : {
        "input" : 10000,
        "emit" : 10000,
        "reduce" : 500,
        "output" : 10
    },
    "ok" : 1,
}

回答1:


Map/Reduce is essentially a recursive operation. In particular, the documented requirements for the reduce function include the following statement:

MongoDB can invoke the reduce function more than once for the same key. In this case, the previous output from the reduce function for that key will become one of the input values to the next reduce function invocation for that key.

Therefore, you have to expect that the input is merely the number that was counted by a previous invocation. The following code does that by actually adding the values:

db.docs.mapReduce(
    function() { emit(this.index % 10, 1); }, 
    function(key,values) { return Array.sum(values); }, 
    { out : {inline : 1} } );

Now, the emit(key, 1) makes more sense in a way, because 1 is no longer just any number used to fill the array, but its value is considered.

As a sidenote, note how dangerous this is: For a smaller dataset, the correct result might have been given by accident, because the engine decided a parallelization wouldn't be necessary.



来源:https://stackoverflow.com/questions/19252830/reduce-is-called-several-times-with-the-same-key-in-mongodb-map-reduce

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!