Aggregate framework can't use indexes

余生颓废 提交于 2019-12-23 21:57:13

问题


I run this command:

db.ads_view.aggregate({$group: {_id : "$campaign", "action" : {$sum: 1} }});

ads_view : 500 000 documents.

this queries take 1.8s . this is its profile : https://gist.github.com/afecec63a994f8f7fd8a

indexed : db.ads_view.ensureIndex({campaign: 1});

But mongodb don't use index. Anyone know if can aggregate framework use indexes, how to index this query.


回答1:


The $group operator is not one of the ones that will use an index currently. The list of operators that do (as of 2.2) are:

$match
$sort
$limit
$skip

From here:

http://docs.mongodb.org/manual/applications/aggregation/#pipeline-operators-and-indexes

Based on the number of yields going on in the gist, I would assume you either have a very active instance or that a lot of this data is not in memory when you are doing the group (it will yield on page fault usually too), hence the 1.8s

Note that even if $group could use an index, and your index covered everything being grouped, it would still involve a full scan of the index to do the group, and would likely not be terrible fast anyway.




回答2:


This is a late answer, but since $group in Mongo as of version 4.0 still won't make use of indexes, it may be helpful for others.

To speed up your aggregation significantly, performe a $sort before $group.

So your query would become:

db.ads_view.aggregate({$sort:{"campaign":1}},{$group: {_id : "$campaign", "action" : {$sum: 1} }});

This assumes an index on campaign, which should have been created according to your question. In Mongo 4.0, create the index with db.ads_view.createIndex({campaign:1}).

I tested this on a collection containing 5.5+ Mio. documents. Without $sort, the aggregation would not have finished even after several hours; with $sort preceeding $group, aggregation is taking a couple of seconds.




回答3:


$group doesn't use an index because it doesn't have to. When you $group your items you're essentially indexing all documents passing through the $group stage of the pipeline using your $group's _id. If you used an index that matched the $group's _id, you'd still have to pass through all the docs in the index so it's the same amount of work.



来源:https://stackoverflow.com/questions/13466079/aggregate-framework-cant-use-indexes

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!