Analyzers in elasticsearch

后端 未结 3 1548
轻奢々
轻奢々 2020-12-12 17:16

I\'m having trouble understanding the concept of analyzers in elasticsearch with tire gem. I\'m actually a newbie to these search concepts. Can someone here help me with som

3条回答
  •  南笙
    南笙 (楼主)
    2020-12-12 17:47

    In Lucene, analyzer is a combination of tokenizer (splitter) + stemmer + stopword filter

    In ElasticSearch, analyzer is a combination of

    1. Character filter: "tidy up" a string before it is tokenized e.g. remove HTML tags
    2. Tokenizer: It's used to break up the string into individual terms or tokens. Must have 1 only.
    3. Token filter: change, add or remove tokens. Stemmer is an example of token filter. It's used to get the base of the word e.g. happy and happiness both have the same base is happi.

    See Snowball demo here

    This is a sample setting:

         {
          "settings":{
            "index" : {
                "analysis" : {
                    "analyzer" : {
                        "analyzerWithSnowball" : {
                            "tokenizer" : "standard",
                            "filter" : ["standard", "lowercase", "englishSnowball"]
                        }
                    },
                    "filter" : {
                        "englishSnowball" : {
                            "type" : "snowball",
                            "language" : "english"
                        }
                    }
                }
            }
          }
        }
    

    Ref:

    1. Comparison of Lucene Analyzers
    2. http://www.elasticsearch.org/guide/en/elasticsearch/guide/current/custom-analyzers.html

提交回复
热议问题