Elasticsearch paginating a sorted, aggregated result

后端 未结 2 546
太阳男子
太阳男子 2020-12-11 04:29

As far as I\'m aware, there isn\'t a way to do something like the following in Elasticsearch:

SELECT * FROM myindex
GROUP BY agg_field1, agg_field2, agg_fiel         


        
相关标签:
2条回答
  • 2020-12-11 04:34

    The composite aggregation might help here as it allows you to group by multiple fields and then paginate over the results. The only thing that it doesn't let you do is to jump at a given offset, but you can do that by iterating from your client code if at all necessary.

    So here is a sample query to do that:

    POST testindex6/_search
    {
      "size": 0,
      "aggs": {
        "my_buckets": {
          "composite": {
            "size": 100,
            "sources": [
              {
                "store": {
                  "terms": {
                    "field": "store_url"
                  }
                }
              },
              {
                "status": {
                  "terms": {
                    "field": "status",
                    "order": "desc"
                  }
                }
              },
              {
                "title": {
                  "terms": {
                    "field": "title",
                    "order": "asc"
                  }
                }
              }
            ]
          },
          "aggs": {
            "hits": {
              "top_hits": {
                "size": 100
              }
            }
          }
        }
      }
    }
    

    In the response you'll see and after_key structure:

      "after_key": {
        "store": "http://google.com1087",
        "status": "OK1087",
        "title": "Titanic1087"
      },
    

    It's some kind of cursor that you need to use in your subsequent queries, like this:

    {
      "size": 0,
      "aggs": {
        "my_buckets": {
          "composite": {
            "size": 100,
            "sources": [
              {
                "store": {
                  "terms": {
                    "field": "store_url"
                  }
                }
              },
              {
                "status": {
                  "terms": {
                    "field": "status",
                    "order": "desc"
                  }
                }
              },
              {
                "title": {
                  "terms": {
                    "field": "title",
                    "order": "asc"
                  }
                }
              }
            ],
            "after": {
              "store": "http://google.com1087",
              "status": "OK1087",
              "title": "Titanic1087"
            }
          },
          "aggs": {
            "hits": {
              "top_hits": {
                "size": 100
              }
            }
          }
        }
      }
    }
    

    And it will give you the next 100 buckets. Hopefully this helps.

    UPDATE:

    If you want to know how many buckets in total there is going to be, the composite aggregation won't give you that number. However, since the composite aggregation is nothing else than a cartesian product of all the fields in its sources, you can get a good approximation of that total number by also returning the ]cardinality](https://www.elastic.co/guide/en/elasticsearch/reference/current/search-aggregations-metrics-cardinality-aggregation.html) of each field used in the composite aggregation and multiplying them together.

      "aggs": {
        "my_buckets": {
          "composite": {
            ...
          }
        },
        "store_cardinality": {
          "cardinality": {
            "field": "store_url"
          }
        },
        "status_cardinality": {
          "cardinality": {
            "field": "status"
          }
        },
        "title_cardinality": {
          "cardinality": {
            "field": "title"
          }
        }
      }
    

    We can then get the total number of buckets by multiplying the figure we get in store_cardinality, status_cardinality and title_cardinality together, or at least a good approximation thereof (it won't work well on high-cardinality fields, but pretty well on low-cardinality ones).

    0 讨论(0)
  • 2020-12-11 04:55

    Field collapsing is the answer.

    Field collapsing feature is used when we want to group the hits on a specific field (as in group by agg_field).

    Before Elastic 6, the way to group the fields is to use aggregation. This approach was lacking an ability to do efficient paging.

    But now, with the field collapse provided out of the box by elastic, it is pretty easy.

    Below is a sample query with field collapse taken from above link.

    GET /twitter/_search
    {
      "query": {
          "match": {
              "message": "elasticsearch"
          }
      },
      "collapse" : {
          "field" : "user", 
          "inner_hits": {
              "name": "last_tweets", 
              "size": 5, 
              "sort": [{ "date": "asc" }] 
          },
          "max_concurrent_group_searches": 4 
      },
      "sort": ["likes"]
    

    }

    0 讨论(0)
提交回复
热议问题