Analyzers in elasticsearch

后端未结

关注

 3  1548

轻奢々 2020-12-12 17:16

I\'m having trouble understanding the concept of analyzers in elasticsearch with tire gem. I\'m actually a newbie to these search concepts. Can someone here help me with som

3条回答

南笙 (楼主)

2020-12-12 17:47
In Lucene, analyzer is a combination of tokenizer (splitter) + stemmer + stopword filter

In ElasticSearch, analyzer is a combination of
1. Character filter: "tidy up" a string before it is tokenized e.g. remove HTML tags
2. Tokenizer: It's used to break up the string into individual terms or tokens. Must have 1 only.
3. Token filter: change, add or remove tokens. Stemmer is an example of token filter. It's used to get the base of the word e.g. happy and happiness both have the same base is happi.
See Snowball demo here

This is a sample setting:
```
     {
      "settings":{
        "index" : {
            "analysis" : {
                "analyzer" : {
                    "analyzerWithSnowball" : {
                        "tokenizer" : "standard",
                        "filter" : ["standard", "lowercase", "englishSnowball"]
                    }
                },
                "filter" : {
                    "englishSnowball" : {
                        "type" : "snowball",
                        "language" : "english"
                    }
                }
            }
        }
      }
    }
```
Ref:
1. Comparison of Lucene Analyzers
2. http://www.elasticsearch.org/guide/en/elasticsearch/guide/current/custom-analyzers.html
0 讨论(0)

查看其它3个回答
发布评论:

提交评论
- 加载中...