As I am new to elastic search, I am not able to identify difference between ngram token filter and edge ngram token filter.
How these two differ f
ngram
moves the cursor while breaking the text:
Text: Red Wine
Options:
ngram_min: 2
ngram_max: 3
Result: Re, Red, ed, Wi, Win, in, ine, ne
As you see here, the cursor moves ngram_min
times to the next fragment until it reaches the ngram_max
.
ngram_edge
does the exact same thing as ngram
but it doesn't move the cursor:
Text: Red Wine
Options:
ngram_min: 2
ngram_max: 3
Result: Re, Red
Why didn't it return Win
? because the cursor doesn't move, it'll always start from the position zero, moves ngram_min
times and backs to the same position (which is always zero).
Think of ngram_edge
as if it was a substring
function in other programming languages such as JavaScript:
// ngram
let str = "Red Wine";
console.log(str.substring(0, 2)); // Re
console.log(str.substring(0, 3)); // Red
console.log(str.substring(1, 3)); // ed, start from position 1
// ...
// ngram_edge
// notice that the position is always zero
console.log(str.substring(0, 2)); // Re
console.log(str.substring(0, 3)); // Red
Try it out by yourself using Kibana:
PUT my_index
{
"settings": {
"analysis": {
"tokenizer": {
"my_ngram_tokenizer" : {
"type" : "ngram",
"min_gram": 2,
"max_gram": 3,
"token_chars": [
"letter",
"digit"
]
},
"my_edge_ngram_tokenizer": {
"type": "edge_ngram",
"min_gram": 2,
"max_gram": 3
}
}
}
}
}
POST my_index/_analyze
{
"tokenizer": "my_ngram_tokenizer",
"text": "Red Wine"
}
POST my_index/_analyze
{
"tokenizer": "my_edge_ngram_tokenizer",
"text": "Red Wine"
}