What is the correct way to Index in MongoDB when big combination of fields exist

柔情痞子 提交于 2020-06-27 06:06:10

问题


Considering I have search pannel that inculude multiple options like in the picture below:

I'm working with mongo and create compound index on 3-4 properties with specific order. But when i run a different combinations of searches i see every time different order in execution plan (explain()). Sometime i see it on Collection scan (bad) , and sometime it fit right to the index (IXSCAN).

The selective fields that should handle by mongo indexes are:(brand,Types,Status,Warehouse,Carries ,Search - only by id)

My question is:

Do I have to create all combination with all fields with different order , it can be 10-20 compound indexes. Or 1-3 big Compound Index , but again it will not solve the order.

What is the best strategy to deal with big various of fields combinations.

I use same structure queries with different combinations of pairs

// Example Query. 
// fields could be different every time according to user select (and order) !!

 db.getCollection("orders").find({
  '$and': [
    {
      'status': {
        '$in': [
          'XXX',
          'YYY'
        ]
      }
    },
    {
      'searchId': {
        '$in': [
          '3859447'
        ]
      }
    },
    {
      'origin.brand': {
        '$in': [
          'aaaa',
          'bbbb',
          'cccc',
          'ddd',
          'eee',
          'bundle'
        ]
      }
    },
    {
      '$or': [
        {
          'origin.carries': 'YYY'
        },
        {
          'origin.carries': 'ZZZ'
        },
        {
          'origin.carries': 'WWWW'
        }
      ]
    }
  ]
}).sort({"timestamp":1})
// My compound index is:
{status:1 ,searchId:-1,origin.brand:1, origin.carries:1 , timestamp:1}

but it only 1 combination ...it could be plenty like

a. {status:1} {b.status:1 ,searchId:-1} {c. status:1 ,searchId:-1,origin.brand:1} {d.status:1 ,searchId:-1,origin.brand:1, origin.carries:1} ........

Additionally , What will happened with Performance write/read ? , I think write will decreased over reads ...

The queries pattern are :

1.find(...) with '$and'/'$or' + sort

2.Aggregation with Match/sort

thanks


回答1:


Generally, indexes are only useful if they are over a selective field. This means the number of documents that have a particular value is small relative to the overall number of documents.

What "small" means varies on the data set and the query. A 1% selectivity is pretty safe when deciding whether an index makes sense. If an particular value exists in, say, 10% of documents, performing a table scan may be more efficient than using an index over the respective field.

With that in mind, some of your fields will be selective and some will not be. For example, I suspect filtering by "OK" will not be very selective. You can eliminate non-selective fields from indexing considerations - if someone wants all orders which are "OK" with no other conditions they'll end up doing a table scan. If someone wants orders which are "OK" and have other conditions, whatever index is applicable to other conditions will be used.

Now that you are left with selective (or at least somewhat selective) fields, consider what queries are both popular and selective. For example, perhaps brand+type would be such a combination. You could add compound indexes that match popular queries which you expect to be selective.

Now, what happens if someone filters by brand only? This could be selective or not depending on the data. If you already have a compound index on brand+type, you'd leave it up to the database to determine whether a brand only query is more efficient to fulfill via the brand+type index or via a collection scan.

Continue in this manner with other popular queries and fields.




回答2:


So you have subdocuments, ranged queries, and sorting by 1 field only.

It can eliminate most of the possible permutations. Assuming there are no other surprises.

D. SM already covered selectivity - you should really listen what the man says and at least upvote.

The other things to consider is the order of the fields in the compound index:

  1. fields that have direct match like $eq
  2. fields you sort on
  3. fields with ranged queries: $in, $lt, $or etc

These are common rules for all b-trees. Now things that are specific to mongo:

A compound index can have no more than 1 multikey index - the index by a field in subdocuments like "origin.brand". Again I assume origins are embedded docs, so the document's shape is like this:

{
    _id: ...,
    status: ...,
    timestamp: ....,
    origin: [
        {brand: ..., carries: ...},
        {brand: ..., carries: ...},
        {brand: ..., carries: ...}
    ]
}

For your query the best index would be

{
  searchId: 1,
  timestamp: 1,
  status: 1, /** only if it is selective enough **/
  "origin.carries" : 1 /** or brand, depending on data **/
}

Regarding the number of indexes - it depends on data size. Ensure all indexes fit into RAM otherwise it will be really slow.

Last but not least - indexing is not a one off job but a lifestyle. Data change over time, so do queries. If you care about performance and have finite resources you should keep an eye on the database. Check slow queries to add new indexes, collect stats from user's queries to remove unused indexes and free up some room. Basically apply common sense.



来源:https://stackoverflow.com/questions/62263023/what-is-the-correct-way-to-index-in-mongodb-when-big-combination-of-fields-exist

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!