How to efficiently page batches of results with MongoDB

淺唱寂寞╮ 提交于 2019-12-06 10:51:34

In order to efficiently "page" through results in the way that you want, it is better to use a "range query" and keep the last value you processed.

You desired "sort key" here is _id, so that makes things simple:

First you want your index in the correct order which is done with .createIndex() which is not the deprecated method:

db.collection.createIndex({ "language": 1, "_id": -1 })

Then you want to do some simple processing, from the start:

var lastId = null;

var cursor = db.collection.find({language:"hi"});
cursor.sort({_id:-1}).limit(5000).forEach(funtion(doc) {
    // do something with your document. But always set the next line
    lastId = doc._id;
})

That's the first batch. Now when you move on to the next one:

var cursor = db.collection.find({ "language":"hi", "_id": { "$lt": lastId });
cursor.sort({_id:-1}).limit(5000).forEach(funtion(doc) {
    // do something with your document. But always set the next line
    lastId = doc._id;
})

So that the lastId value is always considered when making the selection. You store this between each batch, and continue on from the last one.

That is much more efficient than processing with .skip(), which regardless of the index will "still" need to "skip" through all data in the collection up to the skip point.

Using the $lt operator here "filters" all the results you already processed, so you can move along much more quickly.

When you want to sort descending, you should create a multi-field index which uses the field(s) you sort on as descending field(s). You do that by setting those field(s) to -1.

This index should greatly increase the performance of your sort:

db.collection.ensureIndex({ language: 1, _id: -1 });

When you also want to speed up the other case - retrieving sorted in ascending order - create a second index like this:

db.collection.ensureIndex({ language: 1, _id: 1 });

Keep in mind that when you do not sort your results, you receive them in natural order. Natural order is often insertion order, but there is no guarantee for that. There are various events which can cause the natural order to get messed up, so when you care about the order you should always sort explicitly. The only exception to this rule are capped collections which always maintain insertion order.

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!