Index Bounds on Mongo Regex Search

陌路散爱 提交于 2021-02-08 12:20:25

问题


I'm using MongoDB, and I have a collection of documents with the following structure:

{
    fName:"Foo",
    lName:"Barius",
    email:"fbarius@example.com",
    search:"foo barius"
}

I am building a function that will perform a regular expression search on the search field. To optimize performance, I have indexed this collection on the search field. However, things are still a bit slow. So I ran an explain() on a sample query:

db.Collection.find({search:/bar/}).explain();

Looking under the winning plan, I see the following index bounds used:

"search": [
        "[\"\", {})",
        "[/.*bar.*/, /.*bar.*/]"
]

The second set makes sense - it's looking from anything that contains bar to anything that contains bar. However, the first set baffles me. It appears to be looking in the bounds of "" inclusive to {} exclusive. I'm concerned that this extra set of bounds is slowing down my query. Is it necessary to keep? If it's not, how can I prevent it from being included?


回答1:


I think it's just the way mongodb works with regex (see https://scalegrid.io/blog/mongodb-regular-expressions-indexes-performance/). Just watch out for nscanned/totalKeysExamined value, if it's too large then the index is useless for your query.

See also: MongoDB, performance of query by regular expression on indexed fields




回答2:


This is the way mongo works with this type of regex and an index. What I mean is that you are searching for /bar/ instead of /^bar/.

When you specify an index on that field, it is indexing from the first character. So "Foo barius" is indexed beginning with F. Since you are searching for "bar" anywhere in the field you have to search the entire index on that field looking *bar*.

The first line in your explain says look at every record in the index.

The second line say, give me only those indices from (1) that have bar in them.

Bottom line: Design your records so they use the index efficiently. In the case of strings, make sure your searches are at the beginning of the string, e.g., /^bar/. If I'm going to search by last name then it needs to occur first in an indexed field.

As an exercise do an explain on /^bar/ instead. You won't get your data, but the first index bounds will be something like /^bar/ to /^bas/.

I hope my stream of consciousness answer is helpful.

UDude




回答3:


Thought I'd add my two cents.

The previous two answers are correct. The regex expression can only use an standard index if you start your search from the beginning. Actually, having an index and searching by regex can have a detrimental effect on your search because it attempts to use the index but wont be successful.

There is another type of index that may be useful in your situation. Mongo's text index. It indexes each word based on spaces, so it would be able to do an indexed search on both the words "foo" and "barius", which might be more use

Here's the docs for that: https://docs.mongodb.com/manual/core/index-text/



来源:https://stackoverflow.com/questions/38209993/index-bounds-on-mongo-regex-search

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!