Limit ElasticSearch aggregation to top n query results

后端 未结 3 549
粉色の甜心
粉色の甜心 2020-12-09 17:24

I have a set of 2.8 million docs with sets of tags that I\'m querying with ElasticSearch, but many of these docs can be grouped together by one ID. I want to query my data u

相关标签:
3条回答
  • 2020-12-09 18:08

    You can use the min_doc_count parameter

    {
    "aggs" : {
        "products" : {
            "terms" : {
                "field" : "product",
                "min_doc_count" : 100
                      }
                     }
             }
    }
    
    0 讨论(0)
  • 2020-12-09 18:09

    The size parameter can be set to define how many term buckets should be returned out of the overall terms list.

    By default, the node coordinating the search process will request each shard to provide its own top size term buckets and once all shards respond, it will reduce the results to the final list that will then be returned to the client. This means that if the number of unique terms is greater than size, the returned list is slightly off and not accurate (it could be that the term counts are slightly off and it could even be that a term that should have been in the top size buckets was not returned).

    If set to 0, the size will be set to Integer.MAX_VALUE.

    Here is an example code to return top 100:

    {
    "aggs" : {
        "products" : {
            "terms" : {
                "field" : "product",
                "size" : 100
                      }
                     }
             }
    }
    

    You can refer to this for more information.

    0 讨论(0)
  • 2020-12-09 18:18

    Sampler Aggregation :

    A filtering aggregation used to limit any sub aggregations' processing to a sample of the top-scoring documents.

    "aggs": {
         "bestDocs": {
             "sampler": {
              //    "field": "<FIELD>", <-- optional, Controls diversity using a field
                  "shard_size":100
             },
             "aggs": {
                  "bestBuckets": {
                     "terms": {
                          "field": "id"
                      }
                   }
             }
          }
      }
    

    This query will limit the sub aggregation to top 100 docs from the result and then bucket them by ID.

    Optionally, you can use the field or script and max_docs_per_value settings to control the maximum number of documents collected on any one shard which share a common value.

    0 讨论(0)
提交回复
热议问题