I\'m having trouble understanding the concept of analyzers in elasticsearch with tire gem. I\'m actually a newbie to these search concepts. Can someone here help me with som
In Lucene, analyzer is a combination of tokenizer (splitter) + stemmer + stopword filter
In ElasticSearch, analyzer is a combination of
Character filter: "tidy up" a string before it is tokenized e.g. remove HTML tagsTokenizer: It's used to break up the string into individual terms or tokens. Must have 1 only.Token filter: change, add or remove tokens. Stemmer is an example of token filter. It's used to get the base of the word e.g. happy and happiness both have the same base is happi.See Snowball demo here
This is a sample setting:
{
"settings":{
"index" : {
"analysis" : {
"analyzer" : {
"analyzerWithSnowball" : {
"tokenizer" : "standard",
"filter" : ["standard", "lowercase", "englishSnowball"]
}
},
"filter" : {
"englishSnowball" : {
"type" : "snowball",
"language" : "english"
}
}
}
}
}
}
Ref: