Scoring by term position in ElasticSearch?

后端 未结 2 1422
Happy的楠姐
Happy的楠姐 2020-12-05 15:53

I\'m implementing an auto-complete index in ElasticSearch and have run into an issue with sorting/scoring. Say I have the following strings in an index:

appl         


        
相关标签:
2条回答
  • 2020-12-05 16:21

    Here's the solution I ended up with, based on Andrei's answer and expanded to support multiple search terms and additional scoring based on length of the first word in the result:

    First, define the following custom analyzer (it keeps the entire string as a single token and lowercases it):

    "raw_analyzer": {
        "type": "custom",
        "filter": [
            "lowercase"
        ],
        "tokenizer": "keyword"
    }
    

    Second, define your search field mapping like so (mine's named "name"):

    "name": {
        "type": "string",
        "analyzer": "english",
        "fields": {
            "raw": {
                "type": "string",
                "index_analyzer": "raw_analyzer",
                "search_analyzer": "standard"
            }
        }
    },
    "_nameFirstWordLength": {
        "type": "long"
    }
    

    Third, when populating the index use the following logic (mine's in C#) to populate:

    _nameFirstWordLength = fi.Name.Split(new[] {' '}, StringSplitOptions.RemoveEmptyEntries)[0].Length
    

    Finally, do your search as follows:

    {
       "query":{
          "bool":{
             "must":{
                "match_phrase_prefix":{
                   "name":{
                      "query":"apple"
                   }
                }
             },
             "should":{
                "function_score":{
                   "query":{
                      "query_string":{
                         "fields":[
                            "name.raw"
                         ],
                         "query":"apple*"
                      }
                   },
                   "script_score":{
                      "script":"100/doc['_nameFirstWordLength'].value"
                   },
                   "boost_mode":"replace"
                }
             }
          }
       }
    }
    

    I'm using match_phrase_prefix so that partial matches are supported, such as "ap" matching "apple". The bool must/should with that second query_string query against name.raw gives a higher score to results whose name starts with one of the search terms (in my code I'm pre-processing the search string, just for that second query, to add a "*" after every word). Finally, wrapping that second query in a function_score script that uses the value of _nameFirstWordLength causes the results up-scored by the second query to be further sorted by the length of their first word (causing Apple to show before Applebee's, for example).

    0 讨论(0)
  • 2020-12-05 16:22

    You can do a custom sorting, like this:

    {
      "query": {
        "match": {
          "content": "donut"
        }
      },
      "sort": {
        "_script": {
          "script": "termInfo=_index['content'].get('donut',_OFFSETS);for(pos in termInfo){return _score+pos.startOffset};",
          "type": "number",
          "order": "asc"
        }
      }
    }
    

    In there I just returned the startOffset. If you need something else, play with those values and the original scoring and come up with a comfortable value for your needs.

    Or you can do something like this:

    {
      "query": {
        "function_score": {
          "query": {
            "match": {
              "content": "donut"
            }
          },
          "script_score": {
            "script": "termInfo=_index['content'].get('donut',_OFFSETS);for(pos in termInfo){return pos.startOffset};"
          },
          "boost_mode": "replace"
        }
      },
      "sort": [
        {
          "_score": "asc"
        }
      ]
    }
    

    In either case you need in your mapping for that specific field to have this:

    "content": {
      "type": "string",
      "index_options": "offsets"
    }
    

    meaning index_options needs to be set to offsets. Here more details about this.

    0 讨论(0)
提交回复
热议问题