MongoDB - Full Text Index - Full Text Search - stemming

二次信任 提交于 2019-12-20 04:12:17

问题


I noticed that if I enter the value 'seasons' in a full text search enabled string field of some collection, then MongoDB finds this value when I query for 'season'. But if I enter something more complex like e.g. 'mice' or 'criteria', it does not find these values when I query for 'mouse' or 'criterion' respectively. Is that normal and are there any clear rules what MongoDB is able to stem and what not?

[test] 2014-03-30 18:25:09.551 >>> db.TestFullText7.find();
{
        "_id" : ObjectId("53389720063ab25d2d55c94c"),
        "dt" : ISODate("2014-03-30T22:13:52.717Z"),
        "title" : "mice",
        "txt" : "mice"
}
{
        "_id" : ObjectId("5338994c063ab25d2d55c94d"),
        "dt" : ISODate("2014-03-30T22:23:08.259Z"),
        "title" : "criteria",
        "txt" : "criteria"
}
{
        "_id" : ObjectId("533899c5063ab25d2d55c94e"),
        "dt" : ISODate("2014-03-30T22:25:09.551Z"),
        "title" : "seasons",
        "txt" : "seasons"
}
[test] 2014-03-30 18:25:13.295 >>> db.runCommand({"text" : "TestFullText7", "search" : "season"});
{
        "queryDebugString" : "season||||||",
        "language" : "english",
        "results" : [
                {
                        "score" : 2,
                        "obj" : {
                                "_id" : ObjectId("533899c5063ab25d2d55c94e"),
                                "dt" : ISODate("2014-03-30T22:25:09.551Z"),
                                "title" : "seasons",
                                "txt" : "seasons"
                        }
                }
        ],
        "stats" : {
                "nscanned" : 1,
                "nscannedObjects" : 0,
                "n" : 1,
                "nfound" : 1,
                "timeMicros" : 148
        },
        "ok" : 1
}
[test] 2014-03-30 18:25:22.406 >>> db.runCommand({"text" : "TestFullText7", "search" : "mouse"});
{
        "queryDebugString" : "mous||||||",
        "language" : "english",
        "results" : [ ],
        "stats" : {
                "nscanned" : 0,
                "nscannedObjects" : 0,
                "n" : 0,
                "nfound" : 0,
                "timeMicros" : 110
        },
        "ok" : 1
}
[test] 2014-03-30 18:25:30.986 >>> db.TestFullText7.getIndexes();
[
        {
                "v" : 1,
                "key" : {
                        "_id" : 1
                },
                "ns" : "test.TestFullText7",
                "name" : "_id_"
        },
        {
                "v" : 1,
                "key" : {
                        "_fts" : "text",
                        "_ftsx" : 1
                },
                "ns" : "test.TestFullText7",
                "name" : "$**_text",
                "weights" : {
                        "$**" : 1
                },
                "default_language" : "english",
                "language_override" : "language",
                "textIndexVersion" : 1
        }
]
[test] 2014-03-30 18:25:45.228 >>>

回答1:


MongoDB uses the Snowball stemming library. Unfortunately, this looks to be one of the limitations of this library.

You can see the pages for the english stemmer here. Compare the vocabulary + stemmed equivalent page and you can see "Mouse" becoming "Mous" and "Mice" still remaining "Mice".

You can see MongoDB's use of Snowball in their codebase here and here



来源:https://stackoverflow.com/questions/22750643/mongodb-full-text-index-full-text-search-stemming

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!