Elastic Search: “Exact” phrase matching with wildcards

这一生的挚爱 提交于 2020-01-05 08:17:32

问题


I'm using Elastic to search for names in a genealogical database. One of the options for the search is "exact search". The problem is that my clients want wildcards to be allowed in the exact search, so the difference between exact and inexact is that inexact search will return fuzzy matches, whereas exact should return the exact phrase searched for with the exception of wildcards (no fuzzy results).

In order to enable wildcards, the search is currently using querystring. This is the format of the exact search:

{
  "query": {
      "filtered": {
          "query": {
              "bool": {
                  "must": [
                      {
                          "dis_max": {
                              "queries": [
                                  {
                                      "match": {
                                          "first_name": {
                                              "type": "phrase",
                                              "query": "mary c.",
                                              "fuzziness": 0,
                                              "analyzer": "standard",
                                              "boost": 2
                                          }
                                      }
                                  },
                                  {
                                      "query_string": {
                                          "query": "mary c.",
                                          "default_field": "first_name",
                                          "analyzer": "standard",
                                          "fuzzy_min_sim": 0,
                                          "boost": 0.5
                                      }
                                  }
                              ]
                          }
                      }
                  ]
              }
          }
      }
  }

}

I have a boost so that fully exact matches are returned first, which works fine. However, after my exact matches, I get (using Mary C. as an example) results like "Mary F." or "James C.". My clients don't want this, as it's not exact enough; I should ONLY get results with name Mary C., or, if I search for "Mar* C." I should get "Mary C." or "Martin C.", but I shouldn't get "James C." or "Mary F."

I added "default_operator": "AND" to the querystring like such:

{
    "query_string": {
        "query": "mary c.",
        "default_field": "first_name",
        "analyzer": "standard",
        "fuzzy_min_sim": 0,
        "boost": 0.5,
        "default_operator": "AND"
    }
}

which is better, but still not quite right; now I only get results that have "Mary" AND "C." in the first name, but some of them are "Mary Jane C." and "Mary, widow of James C."

Is there any way I can make the query_string match more exactly? At the very least, the phrase prefix should match, so "Mary C." shouldn't return "Mary, widow of James C." but only "Mary C. ....". Ideally, Mary C. should ONLY match "Mary C." and "Mar* C." will match "Mary C.", "Martin C.", etc.

来源:https://stackoverflow.com/questions/27275213/elastic-search-exact-phrase-matching-with-wildcards

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!