Query performance issue for large nested data in mongodb

假如想象 提交于 2019-12-06 08:07:33

without the explain() of the query it's impossible to know for sure what is the bottleneck of the query. However, here are some advices on how to improve this query


Use a single $project stage at the end of the pipeline

the query contains 5 $project stage, when actually only one is needed. This can add a lot of overhead, especially if applied to a large number of document. Instead, use dot notation to query nested fields, for example:

{ "$unwind": "$workers.tasks" }

Call $match as early as possible

$match allows to remove some of the documents, so add it as early as possible to apply further aggregation stage on a lower number of documents

Call skip and $limit before $project

As the query returns only 10 documents, no need to apply the $project stage on the 180000 other docs

Properly index the field used for sorting

This is likely to be the bottleneck. Make sure that the field workers.tasks.start is indexed ( see MongoDB ensureIndex() for details )

Do not compute the nb of documents returned in the query

Instead of the $group/$unwind stage to count matching documents, run another query in the same time for counting only the number of matching documents


the main query now looks like:

db.collection.aggregate([{
        "$unwind": "$workers"
    }, {
        "$unwind": "$workers.tasks"
    }, {
        "$match": {
            "workers.tasks.start": {
                "$ne": null
            }
        }
    },
    {
        "$sort": {
            "workers.tasks.start": 1
        }
    }, {
        "$skip": 0
    }, {
        "$limit": 10
    },
    {
        "$project": {
            "task_number": "$workers.tasks.task_number",
            "pieces_actual": "$workers.tasks.pieces_actual",
            "minutes_elapsed": "$workers.tasks.minutes_elapsed",
            "worker_number": "$workers.worker_number",
            "start": "$workers.tasks.start",
            "inbound_order_number": "$workers.tasks.inbound_order_number",
            "pause_from": "$workers.tasks.pause_from",
            "date": "$workers.tasks.date",
            "_id": "$workers.tasks._id",
            "pause_to": "$workers.tasks.pause_to"
        }
    }
])

you can try it here: mongoplayground.net/p/yua7qspo2Jj

the count query would be

db.collection.aggregate([{
        "$unwind": "$workers"
    }, {
        "$unwind": "$workers.tasks"
    }, {
        "$match": {
            "workers.tasks.start": {
                "$ne": null
            }
        }
    },
    {
        "$count": "entries_count"
    }
])

the count query would look like

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!