Query all unique values of a field with Elasticsearch

前端 未结 5 951
陌清茗
陌清茗 2020-12-29 19:52

How do I search for all unique values of a given field with Elasticsearch?

I have such a kind of query like select full_name from authors

5条回答
  •  盖世英雄少女心
    2020-12-29 20:28

    Intuition: In SQL parlance:

    Select distinct full_name from authors;

    is equivalent to

    Select full_name from authors group by full_name;

    So, we can use the grouping/aggregate syntax in ElasticSearch to find distinct entries.

    Assume the following is the structure stored in elastic search :

    [{
        "author": "Brian Kernighan"
      },
      {
        "author": "Charles Dickens"
      }]
    

    What did not work: Plain aggregation

    {
      "aggs": {
        "full_name": {
          "terms": {
            "field": "author"
          }
        }
      }
    }
    

    I got the following error:

    {
      "error": {
        "root_cause": [
          {
            "reason": "Fielddata is disabled on text fields by default...",
            "type": "illegal_argument_exception"
          }
        ]
      }
    }
    

    What worked like a charm: Appending .keyword with the field

    {
      "aggs": {
        "full_name": {
          "terms": {
            "field": "author.keyword"
          }
        }
      }
    }
    

    And the sample output could be:

    {
      "aggregations": {
        "full_name": {
          "buckets": [
            {
              "doc_count": 372,
              "key": "Charles Dickens"
            },
            {
              "doc_count": 283,
              "key": "Brian Kernighan"
            }
          ],
          "doc_count": 1000
        }
      }
    }
    

    Bonus tip:

    Let us assume the field in question is nested as follows:

    [{
        "authors": [{
            "details": [{
                "name": "Brian Kernighan"
              }]
          }]
      },
      {
        "authors": [{
            "details": [{
                "name": "Charles Dickens"
              }]
          }]
      }
    ]
    

    Now the correct query becomes:

    {
      "aggregations": {
        "full_name": {
          "aggregations": {
            "author_details": {
              "terms": {
                "field": "authors.details.name"
              }
            }
          },
          "nested": {
            "path": "authors.details"
          }
        }
      },
      "size": 0
    }
    

提交回复
热议问题