Speed up regex string search in MongoDB

后端未结

关注

 2  825

半阙折子戏 2021-01-01 20:02

I\'m trying to use MongoDB to implement a natural language dictionary. I have a collection of lexemes, each of which has a number of wordforms as subdocuments. This is what

2条回答

北海茫月 (楼主)

2021-01-01 20:37

As suggested by Derick, I refactored the data in my database such that I have "wordforms" as a collection rather than as subdocuments under "lexemes". The results were in fact better! Here are some speed comparisons. The last example using hint is intentionally bypassing the indexes on surface_form, which in the old schema was actually faster.

Old schema (see original question)

Query                                                              Avg. Time
db.lexemes.find({"wordforms.surface_form":"skrun"})                0s
db.lexemes.find({"wordforms.surface_form":/^skr/})                 1.0s
db.lexemes.find({"wordforms.surface_form":/skru/})                 > 3mins !
db.lexemes.find({"wordforms.surface_form":/skru/}).hint('_id_')    2.8s

New schema (see Derick's answer)

Query                                                              Avg. Time
db.wordforms.find({"surface_form":"skrun"})                        0s
db.wordforms.find({"surface_form":/^skr/})                         0.001s
db.wordforms.find({"surface_form":/skru/})                         1.4s
db.wordforms.find({"surface_form":/skru/}).hint('_id_')            3.0s

For me this is pretty good evidence that a refactored schema would make searching faster, and worth the redundant data (or extra join required).

0 讨论(0)

查看其它2个回答