removing “scanAndOrder” : true in my MongoDB query result

人盡茶涼 提交于 2019-12-04 08:07:36

Thanks for the shoutout on Dex!

If it's not too late here in 2013, the index I recommend for avoiding scanAndOrder here is { _id: -1, cl: 1, user_id: 1 }.

The reason is because an $lt on _id and an $in on user_id constitute ranges across multiple index "buckets". An index of any other order than the above means those buckets must still be sorted together to satisfy a sort on _id. By putting _id first, all documents visited in the index will be properly-ordered in advance.

Note that this is a slight improvement over Andre's suggestion ({ _id: -1, user_id: 1, cl: 1 }, which should also avoid scanAndOrder) because it allows the straight equivalency check on cl to prune results.

Check out http://blog.mongolab.com/2012/06/cardinal-ins/ for more detail.

Because you are using a $lt I don't know if you are going to be able to remove a scanAndOrder operation from your query. The regular axiom is that you have the sort field as the last member of your index, but this breaks down where there is a range query going on. You can usually resolve this by inverting the order of the index, inserting the sort field as the first member. There is an additional problem in your case, as the collection is sharded and therefore you will likely always choose at least the shard key index over the 'inverted order index'.

Without knowing the distribution of your data, it is hard to recommend a specific course of action. From the brief testing I have done, adding an index to {cl:1,user_id:1,_id:-1} roughly halved the nscanned and nscannedobjects. This would be using the sort field as the last member of the index, but has the pitfall as discussed above. You could also try the inverse of this {_id:1,user_id:1,cl:-1} but you might find the shard key is going to get picked over that index. You could further try forcing the inverse index with a hint, but this did not lead to any performance gains in my testing.

Try Dex to make sure that the index is working the way you think it should be: https://github.com/mongolab/dex

AFAIK, conditional operators $gt, $lt etc. (comparison functions in mongo queries) makes mongo to not use an index at all (for that part of query). Mongo must scan all documents in a collection to execute such query.

So, this part:

find({ cl:"2012-03", user_id:{$in:users}, _id:{"$lt":new ObjectId('4f788b54204cfa4946000044')} })

will not use index, even if it exists. That makes it slow.

UPDATE Conditional operators $gt, $lt etc. in query allows to use index, but not that efficient as without them. Still mongodb has to scan more documents that returns in result.

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!