Elasticsearch: how to get the top unique values of a field sorted by matching score?

纵饮孤独 提交于 2019-12-23 19:46:49

问题


I have a collection of addresses. Let's simplify and say the only fields are postcode, city, street, streetnumber and name. I'd like to be able to suggest a list of streets when the user enters a postcode, a city and some query for the street.

For example, if the user, in a HTML form, enters:

postcode: 75010
city: Paris
street: rue des

I'd like to get a list of streets like

'rue des petites écuries'
'rue des messageries'
...
'rue du faubourg poissonnière'
...

that I could suggest to the user.

So, I'd like to obtain a list of unique values of the "street" field, sorted according to how well they match my query on the "street" field. I'd like to obtain the 10 best matching streets for this query.

A query returning documents would look like:

{
    "query": {
        "bool": {
            "must": [
                {{"term": {"postcode": "75010"}},
                {{"term": {city": "Paris"}},
                {{"match": {"street": "rue des"}}
            ]    
        }
     }
}

But of course you would get the same street appear many times, since each street can appear multiple times in differerent addresses in the collection.

I tried to use the "aggregation" framework and added an aggs:

{
    "query": {
        "bool": {
            "must": [
                {{"term": {"postcode": "75010"}},
                    {{"term": {city": "Paris"}},
                    {{"match": {"street": "rue des"}}
            ]    
        }
     },
     "aggs": {
        "street_agg": {
            "terms": {
                "field": "street",
                "size": 10
             }
         }           
     }
}

The problem is that it's automatically sorted, not according to the score, but according to the number of documents in each bucket.

I'd like to have the buckets sorted by the score of an arbitrary document picked in each bucket (yes, it's enough to get the score from a single document in a bucket since the score depends only on the content of the street field in my example).

How would you achieve that?


回答1:


Ok, so the solution could actually be found in Elasticsearch aggregation order by top hit score but only if you read the comment here by Shadocko: Elasticsearch aggregation order by top hit score , which I hadn't.

So here's the solution for anyone interested, and for my future self:

{                                 
    'query': {
        'bool': {
            'must': [
                {'term': {'postcode': '75010'}},
                {'term': {'city': 'Paris'}},
                {'match': {'street.autocomplete': 'rue des'}}
            ]
         }
    },
    'aggs': {
        'street_agg': {
            'terms': {
                'field': 'street',
                'size': 10,
                'order': {
                    'max_score': 'desc'
                }
            },
            'aggs': {
                'max_score': {
                    'max': {'script': '_score'}
                }
            }
        }
    }
}

It's not perfect, since it uses the max aggregation function, which means it does unnecessary computation (just taking the score of one document out of a bucket would have been enough). But it seems there's no "pick one" aggregation function, just min, max, avg and sum, so you have to do that. Well, I think computing the max is not that costly anyway.



来源:https://stackoverflow.com/questions/50685190/elasticsearch-how-to-get-the-top-unique-values-of-a-field-sorted-by-matching-sc

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!