how edge ngram token filter differs from ngram token filter?

后端未结

关注

 2  1655

时光取名叫无心 2021-01-07 17:38

As I am new to elastic search, I am not able to identify difference between ngram token filter and edge ngram token filter.

How these two differ f

2条回答

长发绾君心 (楼主)

2021-01-07 18:22

ngram moves the cursor while breaking the text:

Text: Red Wine

Options:
    ngram_min: 2
    ngram_max: 3

Result: Re, Red, ed, Wi, Win, in, ine, ne

As you see here, the cursor moves ngram_min times to the next fragment until it reaches the ngram_max.

ngram_edge does the exact same thing as ngram but it doesn't move the cursor:

Text: Red Wine

Options:
    ngram_min: 2
    ngram_max: 3

Result: Re, Red

Why didn't it return Win? because the cursor doesn't move, it'll always start from the position zero, moves ngram_min times and backs to the same position (which is always zero).

Think of ngram_edge as if it was a substring function in other programming languages such as JavaScript:

// ngram
let str = "Red Wine";
console.log(str.substring(0, 2)); // Re
console.log(str.substring(0, 3)); // Red
console.log(str.substring(1, 3)); // ed, start from position 1
// ...

// ngram_edge
// notice that the position is always zero
console.log(str.substring(0, 2)); // Re
console.log(str.substring(0, 3)); // Red

Try it out by yourself using Kibana:

PUT my_index
{
  "settings": {
    "analysis": {
      "tokenizer": {
        "my_ngram_tokenizer" : {
          "type" : "ngram",
          "min_gram": 2,
          "max_gram": 3,
          "token_chars": [
            "letter",
            "digit"
          ]
        },
        "my_edge_ngram_tokenizer": {
          "type": "edge_ngram",
          "min_gram": 2,
          "max_gram": 3
        }
      }
    }
  }
}

POST my_index/_analyze
{
  "tokenizer": "my_ngram_tokenizer",
  "text": "Red Wine"
}

POST my_index/_analyze
{
  "tokenizer": "my_edge_ngram_tokenizer", 
  "text": "Red Wine"
}

0 讨论(0)

查看其它2个回答