Elastic Search Case Insensitive query with prefix query

大城市里の小女人 提交于 2019-12-08 11:26:36

问题


I am new to elastic search. I have below query

GET deals2/_search 
{
  "size": 200,
  "_source": ["acquireInfo"],
   "query": {
    "bool": {

      "must": [
        {

         "query_string": {
           "fields": ["acquireInfo.company_name.keyword"],
           "query": "az*"
         }
        }
      ]
    }
  }

}

Here I want Elastic should gives results like case insensitive Like string start with below like

"Az" 
"AZ" 
"az"
"aZ"
"Az"

But I am not getting all results like this way. So Anyone can please help me on that.

Example:- I have 4 documents

1)Aziia Avto Ust-Kamenogorsk OOO 
2)AZ Infotech Inc 
3)AZURE Midstream Partners LP 
4)State Oil Fund of the Republic of Azerbaijan

Now searching on az , should return only first 3 docs as they start with az ignoring case here and not the 4th one, which also has az but not at the beginning.


回答1:


This is happening as you are using the keyword field to index the company_name in your application.

The keyword analyzer is a “noop” analyzer which returns the entire input string as a single token for example, company name, consist of foo, Foo, fOo will be stored with case only and searching for foo, will only match foo as elastic search ultimately works on tokens match(which is case sensitive).

What you need is to use a standard analyzer or some other custom analyzer which solves your other use-cases as well and uses lowercase token filter on the field and use the match query which is analyzed, and uses the same analyzer which is used to index the field, this way your search query will generate the same tokens, which is stored in the index and your search will become case-insensitive.

Edit: Had a discussion with the user in chat and updating the answer to suit his requirements, which are below:-

Step 1:- Define settings and mapping for index.

Endpoint :- http://{{hostname}}:{{port}}/{{index}}

{
  "settings": {
    "analysis": {
      "normalizer": {
        "my_normalizer": {
          "type": "custom",
          "char_filter": [],
          "filter": "lowercase"
        }
      }
    }
  },
  "mappings": {
    "properties": {
      "company_name": {
        "type": "keyword",
        "normalizer": "my_normalizer"
      }
    }
  }
}

Step 2: Index all the documents

Endpoint: http://{{hostname}}:{{port}}/{{index}}/_doc/ --> 1,2,3,4 etc

{
    "company_name" : "State Oil Fund of the Republic of Azerbaijan"
}

Step3 :- Search query

Endpoint:- http://{{hostname}}:{{port}}/{{index}}/_search

{ "query": {
    "prefix" : { "company_name" : "az" }
  }
}

This would bring the below expected results:-

{
    "took": 870,
    "timed_out": false,
    "_shards": {
        "total": 1,
        "successful": 1,
        "skipped": 0,
        "failed": 0
    },
    "hits": {
        "total": {
            "value": 3,
            "relation": "eq"
        },
        "max_score": 1,
        "hits": [
            {
                "_index": "prerfixsearch",
                "_type": "_doc",
                "_id": "2ec9df0fc-dc04-47bb-914f-91a9f20d09efd15f2506-293f-4fb2-bdc3-925684a930b5",
                "_score": 1,
                "_source": {
                    "company_name": "AZ Infotech Inc"
                }
            },
            {
                "_index": "prerfixsearch",
                "_type": "_doc",
                "_id": "160d01183-a308-4408-8ac1-a85da950f285edefaca2-0b68-41c6-ba34-21bbef57f84f",
                "_score": 1,
                "_source": {
                    "company_name": "Aziia Avto Ust-Kamenogorsk OOO"
                }
            },
            {
                "_index": "prerfixsearch",
                "_type": "_doc",
                "_id": "1da878175-7db5-4332-baa7-ac47bd39b646f81c1770-7ae1-4536-baed-0a4f6b20fa38",
                "_score": 1,
                "_source": {
                    "company_name": "AZURE Midstream Partners LP"
                }
            }
        ]
    }
}

Explanation:, As earlier OP didn;t mention the exclusion of 4th doc in the search result, that's the reason I suggested creating a text field, so that individuals tokens are generated but now as requirement is only the prefix search, we don't need the individual tokens and we would want only 1 token but it should be lowercased to support the case insensitive search, that's the reason I applied the custom normalizer on company_name field.



来源:https://stackoverflow.com/questions/56629313/elastic-search-case-insensitive-query-with-prefix-query

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!