Scoring by term position in ElasticSearch?

后端 未结 2 1425
Happy的楠姐
Happy的楠姐 2020-12-05 15:53

I\'m implementing an auto-complete index in ElasticSearch and have run into an issue with sorting/scoring. Say I have the following strings in an index:

appl         


        
2条回答
  •  旧时难觅i
    2020-12-05 16:21

    Here's the solution I ended up with, based on Andrei's answer and expanded to support multiple search terms and additional scoring based on length of the first word in the result:

    First, define the following custom analyzer (it keeps the entire string as a single token and lowercases it):

    "raw_analyzer": {
        "type": "custom",
        "filter": [
            "lowercase"
        ],
        "tokenizer": "keyword"
    }
    

    Second, define your search field mapping like so (mine's named "name"):

    "name": {
        "type": "string",
        "analyzer": "english",
        "fields": {
            "raw": {
                "type": "string",
                "index_analyzer": "raw_analyzer",
                "search_analyzer": "standard"
            }
        }
    },
    "_nameFirstWordLength": {
        "type": "long"
    }
    

    Third, when populating the index use the following logic (mine's in C#) to populate:

    _nameFirstWordLength = fi.Name.Split(new[] {' '}, StringSplitOptions.RemoveEmptyEntries)[0].Length
    

    Finally, do your search as follows:

    {
       "query":{
          "bool":{
             "must":{
                "match_phrase_prefix":{
                   "name":{
                      "query":"apple"
                   }
                }
             },
             "should":{
                "function_score":{
                   "query":{
                      "query_string":{
                         "fields":[
                            "name.raw"
                         ],
                         "query":"apple*"
                      }
                   },
                   "script_score":{
                      "script":"100/doc['_nameFirstWordLength'].value"
                   },
                   "boost_mode":"replace"
                }
             }
          }
       }
    }
    

    I'm using match_phrase_prefix so that partial matches are supported, such as "ap" matching "apple". The bool must/should with that second query_string query against name.raw gives a higher score to results whose name starts with one of the search terms (in my code I'm pre-processing the search string, just for that second query, to add a "*" after every word). Finally, wrapping that second query in a function_score script that uses the value of _nameFirstWordLength causes the results up-scored by the second query to be further sorted by the length of their first word (causing Apple to show before Applebee's, for example).

提交回复
热议问题