Query performance issue for large nested data in mongodb

前端未结

关注

 1  1494

I\'m trying to query results from a large dataset called \'tasks\' containing 187297 documents which are nested into another dataset called \'w

相关标签:

1条回答

眼角桃花

2020-12-11 10:21

without the explain() of the query it's impossible to know for sure what is the bottleneck of the query. However, here are some advices on how to improve this query

Use a single $project stage at the end of the pipeline

the query contains 5 $project stage, when actually only one is needed. This can add a lot of overhead, especially if applied to a large number of document. Instead, use dot notation to query nested fields, for example:

{ "$unwind": "$workers.tasks" }

Call $match as early as possible

$match allows to remove some of the documents, so add it as early as possible to apply further aggregation stage on a lower number of documents

Call skip and $limit before $project

As the query returns only 10 documents, no need to apply the $project stage on the 180000 other docs

Properly index the field used for sorting

This is likely to be the bottleneck. Make sure that the field workers.tasks.start is indexed ( see MongoDB ensureIndex() for details )

Do not compute the nb of documents returned in the query

Instead of the $group/$unwind stage to count matching documents, run another query in the same time for counting only the number of matching documents

the main query now looks like:

db.collection.aggregate([{ "$unwind": "$workers" }, { "$unwind": "$workers.tasks" }, { "$match": { "workers.tasks.start": { "$ne": null } } }, { "$sort": { "workers.tasks.start": 1 } }, { "$skip": 0 }, { "$limit": 10 }, { "$project": { "task_number": "$workers.tasks.task_number", "pieces_actual": "$workers.tasks.pieces_actual", "minutes_elapsed": "$workers.tasks.minutes_elapsed", "worker_number": "$workers.worker_number", "start": "$workers.tasks.start", "inbound_order_number": "$workers.tasks.inbound_order_number", "pause_from": "$workers.tasks.pause_from", "date": "$workers.tasks.date", "_id": "$workers.tasks._id", "pause_to": "$workers.tasks.pause_to" } } ])

you can try it here: mongoplayground.net/p/yua7qspo2Jj

the count query would be

db.collection.aggregate([{ "$unwind": "$workers" }, { "$unwind": "$workers.tasks" }, { "$match": { "workers.tasks.start": { "$ne": null } } }, { "$count": "entries_count" } ])

the count query would look like

0 讨论(0)

发布评论:

提交评论

加载中...

验证码

看不清?

提交回复

Query performance issue for large nested data in mongodb

Use a single `$project` stage at the end of the pipeline

Call `$match` as early as possible

Call `skip` and `$limit` before `$project`

Properly index the field used for sorting

Do not compute the nb of documents returned in the query

Query performance issue for large nested data in mongodb

Use a single $project stage at the end of the pipeline

Call $match as early as possible

Call skip and $limit before $project

Properly index the field used for sorting

Do not compute the nb of documents returned in the query

Use a single `$project` stage at the end of the pipeline

Call `$match` as early as possible

Call `skip` and `$limit` before `$project`