Speed up regex string search in MongoDB

后端未结

关注

 2  828

半阙折子戏 2021-01-01 20:02

I\'m trying to use MongoDB to implement a natural language dictionary. I have a collection of lexemes, each of which has a number of wordforms as subdocuments. This is what

2条回答

旧巷少年郎 (楼主)

2021-01-01 20:18
One possibility would be to store all the variants that you're thinking might be useful as an array element — not sure whether that might be possible though!
```
    {
        "number" : "pl",
        "surface_form" : "skrejjen",
        "surface_forms: [ "skrej", "skre" ],
        "phonetic" : "'skrɛjjɛn",
        "pattern" : "CCCVCCVC"
    }
```
I would probably also suggest to not store 1000 word forms with each word, but turn this around to have smaller documents. The smaller your documents are, the less MongoDB would have to read into memory for each search (as long as the search conditions don't require a full scan of course):
```
{
    "word": {
        "pos" : "N",
        "lemma" : "skrun",
        "gloss" : "screw",
    },
    "form" : {
        "number" : "sg",
        "surface_form" : "skrun",
        "phonetic" : "ˈskruːn",
        "gender" : "m"
    },
    "source" : "Mayer2013"
}

{
    "word": {
        "pos" : "N",
        "lemma" : "skrun",
        "gloss" : "screw",
    },
    "form" : {
        "number" : "pl",
        "surface_form" : "skrejjen",
        "phonetic" : "'skrɛjjɛn",
        "pattern" : "CCCVCCVC"
    },
    "source" : "Mayer2013"
}
```
I also doubt that MySQL would be performing better here with searches for random word forms as it will have to do a full table scan just as MongoDB would be. The only thing that could help there is a query cache - but that is something that you could build in your search UI/API in your application quite easily of course.
0 讨论(0)

查看其它2个回答
发布评论:

提交评论
- 加载中...