MongoDB query to slow when using $or operator

不问归期 提交于 2021-02-07 04:15:04

问题


I'm trying to make this query to my collection Audios

    var querySlow = {
        "palabra": {
            $regex: "^" + keywords,
            "$options": "i"
        },
        $or: [{
            "_p_pais": {
                $in: interested_accents
            }
        }, {
            "languageCodeTatoeba": {
                $in: interested_accents_tatoeba
            }
        }]
    }; // takes 20 seconds

This is actually really really slow but if I remove any of the $or, it is very very fast, for example:

    var queryFast1 = {
        "palabra": {
            $regex: "^" + keywords,
            "$options": "i"
        },
        $or: [{
            "_p_pais": {
                $in: interested_accents
            }
        }]
    }; // takes less than 1 second

or this

    var queryFast2 = {
        "palabra": {
            $regex: "^" + keywords,
            "$options": "i"
        },
        $or: [{
            "languageCodeTatoeba": {
                $in: interested_accents_tatoeba
            }
        }]
    }; // takes less than 1 second

this is the .explain() of the slow query:

http://pastebin.com/nrhjB1wf

I actually don't know how to manage the indexes, should I create an index to this collection??


回答1:


There are some issues with your query and indexes:

1. $or uses indexes differently

MongoDB only uses one index for a query, with the exception of queries involving an $or clause. From the Indexing Strategies page:

Generally, MongoDB only uses one index to fulfill most queries. However, each clause of an $or query may use a different index

Also from the $or Clauses and Indexes page:

That is, for MongoDB to use indexes to evaluate an $or expression, all the clauses in the $or expression must be supported by indexes.

With regard to your query, you could try to rearrange the query to make the $or clause a top-level clause:

{$or: [
    {"palabra": {...}, "_p_pais": {...} },
    {"palabra": {...}, "languageCodeTatoeba": {...}}
]}

In this form, MongoDB can use two indexes:

  • Compound index with palabra and _p_pais terms, and
  • Compound index with palabra and languageCodeTatoeba terms

Please use explain("executionStats") to check if the indexes are used correctly. The key metric you want to minimize is the number of documents (nReturned) vs. total docs/keys examined. The closer the ratio is to 1, the more selective your query is, and the better the performance.

For example, if MongoDB has to examine 1000 docs (totalDocsExamined: 1000), but only return 10 document (nReturned: 10), then your query is not very selective (i.e. a ratio of 10/1000). Ideal queries would have a ratio close to or equal to 1, e.g. nReturned: 10, totalDocsExamined: 10, a ratio of 1 (10/10).

For more information regarding explain(), please see:

  • Explain Results
  • db.collection.explain()

2. Too many indexes

Having too many indexes could lead to:

  • The query planner choosing a sub-optimal index, since it won't know which index to use since they all look the same.
  • Relatively slow insert/update performance, since each insert/update to a field that is included in an index would also need to insert/update to the index as well.

From the explain result you posted, you have at least these indexes in the collection:

_p_pais_-1__p_user_-1__created_at_-1
languageCodeTatoeba_1_lowercase_1
languageCodeTatoeba_1
languageCodeTatoeba_-1
_p_pais_-1
_p_pais_1_languageCodeTatoeba_1
palabra_-1
palabra_1__created_at_-1

There are two issues with this set of indexes:

  1. Among the indexes, some are redundant. For example, languageCodeTatoeba_1 (an ascending index) and languageCodeTatoeba_-1 (a descending index) are practically the same index. One of them can be removed without any effect on query performance.
  2. A lot of indexes are prefix of another. For example, palabra_-1 and palabra_1__created_at_. The palabra_-1 index can be removed, since it is the prefix of the palabra_1__created_at_ index. Please see the Compound Index: Prefix page for more details.

From a cursory glance, you may be able to trim your index list to only contain these 4 indexes instead of 8:

_p_pais_-1__p_user_-1__created_at_-1
languageCodeTatoeba_1_lowercase_1
_p_pais_1_languageCodeTatoeba_1
palabra_1__created_at_-1

Please see the following links for more information regarding indexes:

  • Create Indexes to Support Your Queries
  • Indexing Strategies

3. Why removing one clause from the $or term speeds up the queries

This is because the query

{"palabra": {...}, $or: [{"_p_pais": {...}}]}

is essentially the same as

{"palabra": {...}, "_p_pais": {...}}

Assuming you have a compound index such as palabra_1__p_pais_1, MongoDB would be able to use that index.

Similarly,

{"palabra": {...}, $or: [{"languageCodeTatoeba": {...}}]}

is essentially the same as

{"palabra": {...}, "languageCodeTatoeba": {...}}

This query could use the _p_pais_1_languageCodeTatoeba_1 index, which you already have in your collection.

In short, those two queries are fast because you removed the $or clause, enabling MongoDB to use the correct index.



来源:https://stackoverflow.com/questions/42329806/mongodb-query-to-slow-when-using-or-operator

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!