mongodb: How to use an index for distinct command and query?

扶醉桌前 提交于 2020-01-03 17:26:11

问题


I have some problems with very slow distinct commands that use a query. From what I have observed the distinct command only makes use of an index if you do not specify a query:

I have created a test database on my MongoDB 3.0.10 server with 1Mio objects. Each object looks as follows:

{
    "_id" : ObjectId("56e7fb5303858265f53c0ea1"),
    "field1" : "field1_6",
    "field2" : "field2_10",
    "field3" : "field3_29",
    "field4" : "field4_64"
}

The numbers at the end of the field values are random 0-99.

On the collections two simple indexes and one compound-index has been created:

{ "field1" : 1 } # simple index on "field1"
{ "field2" : 1 } # simple index on "field2"
{                # compound index on all fields
    "field2" : 1,
    "field1" : 1,
    "field3" : 1,
    "field4" : 1
}

Now I execute distinct queries on that database:

db.runCommand({ distinct: 'dbtest',key:'field1'})

The result contains 100 values, nscanned=100 and it has used index on "field1".

Now the same distinct query is limited by a query:

db.runCommand({ distinct: 'dbtest',key:'field1',query:{field2:"field2_10"}})

It contains again 100 values, however nscanned=9991 and the used index is the third one on all fields.

Now the third index that was used in the last query is dropped. Again the last query is executed:

db.runCommand({ distinct: 'dbtest',key:'field1',query:{field2:"field2_10"}})

It contains again 100 values, nscanned=9991 and the used index is the "field2" one.

Conclusion: If I execute a distinct command without query the result is taken directly from an index. However when I combine a distinct command with a query only the query uses an index, the distinct command itself does not use an index in such a case.

My problem is that I need to perform a distinct command with query on a very large database. The result set is very large but only contains ~100 distinct values. Therefore the complete distinct command takes ages (> 5 minutes) as it has to cycle through all values.

What needs to be done to perform my distinct command presented above that can be answered by the database directly from an index?


回答1:


The possibility to use an index in a distinct query requires Mongo version 3.4 or higher - it works for both storage engines MMAPv1/WiredTiger.

See also the bug ticket https://jira.mongodb.org/browse/SERVER-19507



来源:https://stackoverflow.com/questions/36011552/mongodb-how-to-use-an-index-for-distinct-command-and-query

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!