Emulate a SQL LIKE search with ElasticSearch

ε祈祈猫儿з 提交于 2019-12-05 10:29:11

I would do it like this:

  • change the tokenizer to edge_nGram since you said you need LIKE 'CityName%' (meaning a prefix match):
  "tokenizer": {
    "autocomplete_edge": {
      "type": "edge_nGram",
      "min_gram": 1,
      "max_gram": 100
    }
  }
  • have the field specify your autocomplete_search as a search_analyzer. I think it's a good choice to have a keyword and lowercase:
  "mappings": {
    "listing": {
      "properties": {
        "city": {
          "type": "string",
          "index_analyzer": "autocomplete_term",
          "search_analyzer": "autocomplete_search"
        }
      }
    }
  }
  • and the query itself is as simple as:
{
  "query": {
    "multi_match": {
      "query": "R",
      "fields": [
        "city"
      ]
    }
  }
}

The detailed explanation goes like this: split your city names in edge ngrams. For example, for Rio de Janeiro you'll index something like:

           "city": [
              "r",
              "ri",
              "rio",
              "rio ",
              "rio d",
              "rio de",
              "rio de ",
              "rio de j",
              "rio de ja",
              "rio de jan",
              "rio de jane",
              "rio de janei",
              "rio de janeir",
              "rio de janeiro"
           ]

You notice that everything is lowercased. Now, you'd want your query to take any text (lowercase or not) and to match it with what's in the index. So, an R should match that list above.

For this to happen you want the input text to be lowercased and to be kept exactly like the user set it, meaning it shouldn't be analyzed. Why you'd want this? Because you already have split the city names in ngrams and you don't want the same for the input text. If user inputs "RI", Elasticsearch will lowercase it - ri - and match it exactly against what it has in the index.

A probably faster alternative to multi_match is to use a term, but this requires your application/website to lowercase the text. The reason for this is that term doesn't analyze the input text at all.

{
  "query": {
    "filtered": {
      "filter": {
        "term": {
          "city": {
            "value": "ri"
          }
        }
      }
    }
  }
}

In Elasticsearch, there is Completion Suggester to give suggestions. Completion Suggester

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!