ElasticSearch: EdgeNgrams and Numbers

前端 未结 2 1935
醉梦人生
醉梦人生 2020-12-19 08:03

Any ideas on how EdgeNgram treats numbers?

I\'m running haystack with an ElasticSearch backend. I created an indexed field of type EdgeNgram. This field will conta

2条回答
  •  南笙
    南笙 (楼主)
    2020-12-19 08:42

    if you're using the edgeNGram tokenizer, then it will treat "EdgeNGram 12323" as a single token and then apply the edgeNGram'ing process on it. For example, if min_grams=1 max_grams=4, you'll get the following tokens indexed: ["E", "Ed", "Edg", "Edge"]. So I guess this is not what you're really looking for - consider using the edgeNGram token filter instead:

    If you're using the edgeNGram token filter, make sure you're using a tokenizer that actually tokenizes the text "EdgeNGram 12323" to produce two tokens out of it: ["EdgeNGram", "12323"] (standard or whitespace tokenizer will do the trick). Then apply the edgeNGram filter next to it.

    In general, edgeNGram will take "12323" and produce tokens such as "1", "12", "123", etc...

提交回复
热议问题