ElasticSearch group by documents field and count occurences

可紊 提交于 2020-01-25 06:53:27

问题


My ElasticSearch 6.5.2 index look likes:

      {
    "_index" : "searches",
    "_type" : "searches",
    "_id" : "cCYuHW4BvwH6Y3jL87ul",
    "_score" : 1.0,
    "_source" : {
      "querySearched" : "telecom",
    }
  },
  {
    "_index" : "searches",
    "_type" : "searches",
    "_id" : "cSYuHW4BvwH6Y3jL_Lvt",
    "_score" : 1.0,
    "_source" : {
      "querySearched" : "telecom",
    }
  },
  {
    "_index" : "searches",
    "_type" : "searches",
    "_id" : "eCb6O24BvwH6Y3jLP7tM",
    "_score" : 1.0,
    "_source" : {
      "querySearched" : "industry",
    }

And I would like a query that return this result:

"result": 
{
"querySearched" : "telecom",
"number" : 2
},
{
"querySearched" : "industry",
"number" : 1
}

I just want to group by occurence and get number of each, limit to ten biggest numbers. I tried with aggregations but bucket is empty. Thanks!


回答1:


Case your mapping

PUT /index
{
  "mappings": {
    "doc": {
      "properties": {
        "querySearched": {
          "type": "text",
          "fielddata": true
        }
      }
    }
  }
}

Your query should looks like

GET index/_search
{
  "size": 0,
  "aggs": {
    "result": {
      "terms": {
        "field": "querySearched",
        "size": 10
      }
    }
  }
}

You should add fielddata:true in order to enable aggregation for text type field more of that

    "size": 10, => limit to 10

After a short discussion with @Kamal i feel obligated to let you know that if you choose to enable fielddata:true you must know that it can consume a lot of heap space.

From the link I've shared:

Fielddata can consume a lot of heap space, especially when loading high cardinality text fields. Once fielddata has been loaded into the heap, it remains there for the lifetime of the segment. Also, loading fielddata is an expensive process which can cause users to experience latency hits. This is why fielddata is disabled by default.

Another alternative (a more efficient one):

PUT /index
{
  "mappings": {
    "doc": {
      "properties": {
        "querySearched": {
          "type": "text",
          "fields": {
           "keyword": {
             "type": "keyword",
             "ignore_above": 256
           }
         }
        }
      }
    }
  }
}

Then your aggregation query

GET index/_search
{
  "size": 0,
  "aggs": {
    "result": {
      "terms": {
        "field": "querySearched.keyword",
        "size": 10
      }
    }
  }
}

Both solutions works but you should take this under consideration.

Hope it helps




回答2:


What did you tried?

POST /searches/_search

   {
      "size": 0,
      "aggs": {
        "byquerySearched": {
          "terms": {
            "field": "querySearched",
             "size": 10
          }
        }
      }
    }


来源:https://stackoverflow.com/questions/58733898/elasticsearch-group-by-documents-field-and-count-occurences

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!