MongoDB - Aggregation Framework (Total Count)

问题

When running a normal "find" query on MongoDB I can get the total result count (regardless of limit) by running "count" on the returned cursor. So, even if I limit to result set to 10 (for example) I can still know that the total number of results was 53 (again, for example).

If I understand it correctly, the aggregation framework, however, doesn't return a cursor but simply the results. And so, if I used the $limit pipeline operator, how can I know the total number of results regardless of said limit?

I guess I could run the aggregation twice (once to count the results via $group, and once with $limit for the actual limited results), but this seems inefficient.

An alternative approach could be to attach the total number of results to the documents (via $group) prior to the $limit operation, but this also seems inefficient as this number will be attached to every document (instead of just returned once for the set).

Am I missing something here? Any ideas? Thanks!

For example, if this is the query:

db.article.aggregate(
    { $group : {
        _id : "$author",
        posts : { $sum : 1 }
    }},
    { $sort : { posts: -1 } },
    { $limit : 5 }
);

How would I know how many results are available (before $limit)? The result isn't a cursor, so I can't just run count on it.

回答1:

Assaf, there's going to be some enhancements to the aggregation framework in the near future that may allow you to do your calculations in one pass easily, but right now, it is best to perform your calculations by running two queries in parallel: one to aggregate the #posts for your top authors, and another aggregation to calculate the total posts for all authors. Also, note that if all you need to do is a count on documents, using the count function is a very efficient way of performing the calculation. MongoDB caches counts within btree indexes allowing for very quick counts on queries.

If these aggregations turn out to be slow there are a couple of strategies. First off, keep in mind that you want start the query with a $match if applicable to reduce the result set. $matches can also be speed up by indexes. Secondly, you can perform these calculations as pre-aggregations. Instead of possible running these aggregations every time a user accesses some part of your app, have the aggregations run periodically in the background and store the aggregations in a collection that contains pre-aggregated values. This way, your pages can simply query the pre-calculated values from this collection.

回答2:

There is a solution using push and slice: https://stackoverflow.com/a/39784851/4752635 (@emaniacs mentions it here as well).

But I prefer using 2 queries. Solution with pushing $$ROOT and using $slice runs into document memory limitation of 16MB for large collections. Also, for large collections two queries together seem to run faster than the one with $$ROOT pushing. You can run them in parallel as well, so you are limited only by the slower of the two queries (probably the one which sorts).

First for filtering and then grouping by ID to get number of filtered elements. Do not filter here, it is unnecessary.
Second query which filters, sorts and paginates.

I have settled with this solution using 2 queries and aggregation framework (note - I use node.js in this example):

var aggregation = [
  {
    // If you can match fields at the begining, match as many as early as possible.
    $match: {...}
  },
  {
    // Projection.
    $project: {...}
  },
  {
    // Some things you can match only after projection or grouping, so do it now.
    $match: {...}
  }
];


// Copy filtering elements from the pipeline - this is the same for both counting number of fileter elements and for pagination queries.
var aggregationPaginated = aggregation.slice(0);

// Count filtered elements.
aggregation.push(
  {
    $group: {
      _id: null,
      count: { $sum: 1 }
    }
  }
);

// Sort in pagination query.
aggregationPaginated.push(
  {
    $sort: sorting
  }
);

// Paginate.
aggregationPaginated.push(
  {
    $limit: skip + length
  },
  {
    $skip: skip
  }
);

// I use mongoose.

// Get total count.
model.count(function(errCount, totalCount) {
  // Count filtered.
  model.aggregate(aggregation)
  .allowDiskUse(true)
  .exec(
  function(errFind, documents) {
    if (errFind) {
      // Errors.
      res.status(503);
      return res.json({
        'success': false,
        'response': 'err_counting'
      });
    }
    else {
      // Number of filtered elements.
      var numFiltered = documents[0].count;

      // Filter, sort and pagiante.
      model.request.aggregate(aggregationPaginated)
      .allowDiskUse(true)
      .exec(
        function(errFindP, documentsP) {
          if (errFindP) {
            // Errors.
            res.status(503);
            return res.json({
              'success': false,
              'response': 'err_pagination'
            });
          }
          else {
            return res.json({
              'success': true,
              'recordsTotal': totalCount,
              'recordsFiltered': numFiltered,
              'response': documentsP
            });
          }
      });
    }
  });
});

回答3:

If you don't want to run two queries in parallel (one to aggregate the #posts for your top authors, and another aggregation to calculate the total posts for all authors) you can just remove $limit on pipeline and on results you can use

totalCount = results.length;
results.slice(number of skip,number of skip + number of limit);

ex:

db.article.aggregate([
    { $group : {
        _id : "$author",
        posts : { $sum : 1 }
    }},
    { $sort : { posts: -1 } }
    //{$skip : yourSkip},    //--remove this
    //{ $limit : yourLimit }, // remove this too
]).exec(function(err, results){
  var totalCount = results.length;//--GEt total count here
   results.slice(yourSkip,yourSkip+yourLimit);
});

回答4:

I got the same problem, and solved with $project, $slice and $$ROOT.

db.article.aggregate(
{ $group : {
    _id : '$author',
    posts : { $sum : 1 },
    articles: {$push: '$$ROOT'},
}},
{ $sort : { posts: -1 } },
{ $project: {total: '$posts', articles: {$slice: ['$articles', from, to]}},
).toArray(function(err, result){
    var articles = result[0].articles;
    var total = result[0].total;
});

You need to declare from and to variable.

https://docs.mongodb.com/manual/reference/operator/aggregation/slice/

回答5:

I get total count with aggregate().toArray().length

回答6:

$facets aggregation operation can be used for Mongo versions >= 3.4. This allows to fork at a particular stage of a pipeline in multiple sub-pipelines allowing in this case to build one sub pipeline to count the number of documents and another one for sorting, skipping, limiting.

This allows to avoid making same stages multiple times in multiple requests.

回答7:

in my case, we use $out stage to dump result set from aggeration into a temp/cache table, then count it. and, since we need to sort and paginate results, we add index on the temp table and save table name in session, remove the table on session closing/cache timeout.

来源：https://stackoverflow.com/questions/17767580/mongodb-aggregation-framework-total-count

标签

mongodb

mongodb-query