how edge ngram token filter differs from ngram token filter?

后端 未结 2 1655
时光取名叫无心
时光取名叫无心 2021-01-07 17:38

As I am new to elastic search, I am not able to identify difference between ngram token filter and edge ngram token filter.

How these two differ f

2条回答
  •  长发绾君心
    2021-01-07 18:22

    ngram moves the cursor while breaking the text:

    Text: Red Wine
    
    Options:
        ngram_min: 2
        ngram_max: 3
    
    Result: Re, Red, ed, Wi, Win, in, ine, ne
    

    As you see here, the cursor moves ngram_min times to the next fragment until it reaches the ngram_max.


    ngram_edge does the exact same thing as ngram but it doesn't move the cursor:

    Text: Red Wine
    
    Options:
        ngram_min: 2
        ngram_max: 3
    
    Result: Re, Red
    

    Why didn't it return Win? because the cursor doesn't move, it'll always start from the position zero, moves ngram_min times and backs to the same position (which is always zero).


    Think of ngram_edge as if it was a substring function in other programming languages such as JavaScript:

    // ngram
    let str = "Red Wine";
    console.log(str.substring(0, 2)); // Re
    console.log(str.substring(0, 3)); // Red
    console.log(str.substring(1, 3)); // ed, start from position 1
    // ...
    
    // ngram_edge
    // notice that the position is always zero
    console.log(str.substring(0, 2)); // Re
    console.log(str.substring(0, 3)); // Red
    

    Try it out by yourself using Kibana:

    PUT my_index
    {
      "settings": {
        "analysis": {
          "tokenizer": {
            "my_ngram_tokenizer" : {
              "type" : "ngram",
              "min_gram": 2,
              "max_gram": 3,
              "token_chars": [
                "letter",
                "digit"
              ]
            },
            "my_edge_ngram_tokenizer": {
              "type": "edge_ngram",
              "min_gram": 2,
              "max_gram": 3
            }
          }
        }
      }
    }
    
    POST my_index/_analyze
    {
      "tokenizer": "my_ngram_tokenizer",
      "text": "Red Wine"
    }
    
    POST my_index/_analyze
    {
      "tokenizer": "my_edge_ngram_tokenizer", 
      "text": "Red Wine"
    }
    

提交回复
热议问题