In mongo, how do I use map reduce to get a group by ordered by most recent

孤者浪人 提交于 2020-01-03 17:17:10

问题


the map reduce examples I see use aggregation functions like count, but what is the best way to get say the top 3 items in each category using map reduce.

I'm assuming I can also use the group function but was curious since they state sharded environments cannot use group(). However, I'm actually interested in seeing a group() example as well.


回答1:


For the sake of simplification, I'll assume you have documents of the form:

{category: <int>, score: <int>}

I've created 1000 documents covering 100 categories with:

for (var i=0; i<1000; i++) {
  db.foo.save({
    category: parseInt(Math.random() * 100),
    score: parseInt(Math.random() * 100)
  });
}

Our mapper is pretty simple, just emit the category as key, and an object containing an array of scores as the value:

mapper = function () {
  emit(this.category, {top:[this.score]});
}

MongoDB's reducer cannot return an array, and the reducer's output must be of the same type as the values we emit, so we must wrap it in an object. We need an array of scores, as this will let our reducer compute the top 3 scores:

reducer = function (key, values) {
  var scores = [];
  values.forEach(
    function (obj) {
      obj.top.forEach(
        function (score) {
          scores[scores.length] = score;
      });
  });
  scores.sort();
  scores.reverse();
  return {top:scores.slice(0, 3)};
}

Finally, invoke the map-reduce:

db.foo.mapReduce(mapper, reducer, "top_foos");

Now we have a collection containing one document per category, and the top 3 scores across all documents from foo in that category:

{ "_id" : 0, "value" : { "top" : [ 93, 89, 86 ] } }
{ "_id" : 1, "value" : { "top" : [ 82, 65, 6 ] } }

(Your exact values may vary if you used the same Math.random() data generator as I have above)

You can now use this to query foo for the actual documents having those top scores:

function find_top_scores(categories) {
  var query = [];
  db.top_foos.find({_id:{$in:categories}}).forEach(
    function (topscores) {
      query[query.length] = {
        category:topscores._id,
        score:{$in:topscores.value.top}
      };
  });
  return db.foo.find({$or:query});

}

This code won't handle ties, or rather, if ties exist, more than 3 documents might be returned in the final cursor produced by find_top_scores.

The solution using group would be somewhat similar, though the reducer will only have to consider two documents at a time, rather than an array of scores for the key.



来源:https://stackoverflow.com/questions/7290307/in-mongo-how-do-i-use-map-reduce-to-get-a-group-by-ordered-by-most-recent

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!