We need to create a compound index in the same order as the parameters are being queried. Does this order matter performance-wise at all?
Imagine we have a collectio
I'm going to say I did an experiment on this myself, and found that there seems to be no performance penalty for using the poorly distinguished index key first. (I'm using mongodb 3.4 with wiredtiger, which may be different than mmap). I inserted 250 million documents into a new collection called items
. Each doc looked like this:
{
field1:"bob",
field2:i + "",
field3:i + ""
"field1"
was always equal to "bob"
. "field2"
was equal to i
, so it was completely unique. First I did a search on field2, and it took over a minute to scan 250 million documents. Then I created an index like so:
`db.items.createIndex({field1:1,field2:1})`
Of course field1 is "bob" on every single document, so the index should have to search a number of items before finding the desired document. However, this was not the result I got.
I did another search on the collection after the index finished creating. This time I got results which I listed below. You'll see that "totalKeysExamined"
is 1 each time. So perhaps with wired tiger or something they have figured out how to do this better. I have read the wiredtiger actually compresses index prefixes, so that may have something to do with it.
db.items.find({field1:"bob",field2:"250888000"}).explain("executionStats")
{
"executionSuccess" : true,
"nReturned" : 1,
"executionTimeMillis" : 4,
"totalKeysExamined" : 1,
"totalDocsExamined" : 1,
"executionStages" : {
"stage" : "FETCH",
"nReturned" : 1,
"executionTimeMillisEstimate" : 0,
"works" : 2,
"advanced" : 1,
...
"docsExamined" : 1,
"inputStage" : {
"stage" : "IXSCAN",
"nReturned" : 1,
"executionTimeMillisEstimate" : 0,
...
"indexName" : "field1_1_field2_1",
"isMultiKey" : false,
...
"indexBounds" : {
"field1" : [
"[\"bob\", \"bob\"]"
],
"field2" : [
"[\"250888000\", \"250888000\"]"
]
},
"keysExamined" : 1,
"seeks" : 1
}
}
Then I created an index on field3
(which has the same value as field 2). Then I searched:
db.items.find({field3:"250888000"});
It took the same 4ms as the one with the compound index. I repeated this a number of times with different values for field2 and field3 and got insignificant differences each time. This suggests that with wiredtiger, there is no performance penalty for having poor differentiation on the first field of an index.