Speed up regex string search in MongoDB

后端 未结 2 825
半阙折子戏
半阙折子戏 2021-01-01 20:02

I\'m trying to use MongoDB to implement a natural language dictionary. I have a collection of lexemes, each of which has a number of wordforms as subdocuments. This is what

2条回答
  •  北海茫月
    2021-01-01 20:37

    As suggested by Derick, I refactored the data in my database such that I have "wordforms" as a collection rather than as subdocuments under "lexemes". The results were in fact better! Here are some speed comparisons. The last example using hint is intentionally bypassing the indexes on surface_form, which in the old schema was actually faster.

    Old schema (see original question)

    Query                                                              Avg. Time
    db.lexemes.find({"wordforms.surface_form":"skrun"})                0s
    db.lexemes.find({"wordforms.surface_form":/^skr/})                 1.0s
    db.lexemes.find({"wordforms.surface_form":/skru/})                 > 3mins !
    db.lexemes.find({"wordforms.surface_form":/skru/}).hint('_id_')    2.8s
    

    New schema (see Derick's answer)

    Query                                                              Avg. Time
    db.wordforms.find({"surface_form":"skrun"})                        0s
    db.wordforms.find({"surface_form":/^skr/})                         0.001s
    db.wordforms.find({"surface_form":/skru/})                         1.4s
    db.wordforms.find({"surface_form":/skru/}).hint('_id_')            3.0s
    

    For me this is pretty good evidence that a refactored schema would make searching faster, and worth the redundant data (or extra join required).

提交回复
热议问题