As I am new to elastic search, I am not able to identify difference between ngram token filter and edge ngram token filter.
How these two differ f
I think the documentation is pretty clear on this:
This tokenizer is very similar to nGram but only keeps n-grams which start at the beginning of a token.
And the best example for nGram tokenizer again comes from the documentation:
curl 'localhost:9200/test/_analyze?pretty=1&analyzer=my_ngram_analyzer' -d 'FC Schalke 04'
# FC, Sc, Sch, ch, cha, ha, hal, al, alk, lk, lke, ke, 04
With this tokenizer definition:
"type" : "nGram",
"min_gram" : "2",
"max_gram" : "3",
"token_chars": [ "letter", "digit" ]
In short:
FC, Schalke, 04.nGram generates groups of characters of minimum min_gram size and maximum max_gram size from an input text. Basically, the tokens are split into small chunks and each chunk is anchored on a character (it doesn't matter where this character is, all of them will create chunks).edgeNGram does the same but the chunks always start from the beginning of each token. Basically, the chunks are anchored at the beginning of the tokens.For the same text as above, an edgeNGram generates this: FC, Sc, Sch, Scha, Schal, 04. Every "word" in the text is considered and for every "word" the first character is the starting point (F from FC, S from Schalke and 0 from 04).