Can ElasticSearch aggregations do what SQL can do?

…衆ロ難τιáo~ 提交于 2019-12-11 04:33:05

问题


In Elasticsearch I need get the frequency and the number of colors that occur the most frequent from the highest to lowest. If I have data like this:

id|name
----------
1|blue
----------
2|blue
----------
3|green
----------
4|yellow
----------
5|blue
----------
6|yellow
----------
7|purple
----------
8|purple
----------
9|purple

I need to get the count of each color and then group by the count. So in the end, I would like all the colors that occurs the same number of times to be inside one group. This is how I would do it in sql.

select 
  count(*) as 'Number of Colors', 
  i.c as 'Seen times' 
from 
    (
      select
        name as 'n', 
        count(*) as 'c'
      from
        colors
      group by name
   ) i 
group by i.c
order by i.c desc;

This would return:

Number of Colors | Seen times
------------------------------
2                | 3
------------------------------
1                | 2
------------------------------
1                | 1

How would I write it in Elasticsearch query? I am using version 5.5.


回答1:


You can use scripted_metric aggregation with painless script

Query

I will explain it in more details below.

POST index/type/_search
{
  "size": 0, 
  "aggs": {
    "colorgroups": {
"scripted_metric": {
"init_script" : "params._agg.transactions = [:]",
"map_script": "params.key = doc['colorname'].value; if(params._agg.transactions[params.key] == null){   params._agg.transactions[params.key] = 1; }else{   params._agg.transactions[params.key] ++ }",
"combine_script": "return params._agg.transactions;",
"reduce_script": "params.color_counters =[:]; params.groups_counters =[:]; for(shard_result in params._aggs){   for(color_name in shard_result.keySet()){     if(params.color_counters[color_name] == null){       params.color_counters[color_name] = shard_result[color_name]     }else{       params.color_counters[color_name] = params.color_counters[color_name] + shard_result[color_name]     }    } }  for(color_name in params.color_counters.keySet()){   params.group_counter = params.color_counters[color_name].toString();   if(params.groups_counters[params.group_counter] == null){     params.groups_counters[params.group_counter] = 1   }else{     params.groups_counters[params.group_counter] ++   } }  return params.groups_counters"
      }
    }
  }
}

Result

{
  "took": 1,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": 9,
    "max_score": 0,
    "hits": []
  },
  "aggregations": {
    "colorgroups": {
      "value": {
        "2": 3,
        "1": 2,
        "1": 1
     }
    }
  }
}

init_script

init some values to keep intermediate results

params._agg.transactions = [:]

map_script

Calculations for each document. Try to keep it small and reduce as much data as possible on this step.

params.key = doc['colorname'].value;
if(params._agg.transactions[params.key] == null){
  params._agg.transactions[params.key] = 1;
}else{
  params._agg.transactions[params.key] ++
}

combine_script

Calculations for each shard. We already did everything in map_script, just return hash map

return params._agg.transactions

reduce_script

Working with partial aggregations from each shard. We need to merge them into one HashMap first. And group by counter value after

params.color_counters =[:];
params.groups_counters =[:];
//merging all partial aggregations to params.color_counters
for(shard_result in params._aggs){
  for(color_name in shard_result.keySet()){
    if(params.color_counters[color_name] == null){
      params.color_counters[color_name] = shard_result[color_name]
    }else{
      params.color_counters[color_name] = params.color_counters[color_name] + shard_result[color_name]
    } 
  }
}

//Grouping by color counter to params.groups_counters
for(color_name in params.color_counters.keySet()){
  params.group_counter = params.color_counters[color_name].toString();
  if(params.groups_counters[params.group_counter] == null){
    params.groups_counters[params.group_counter] = 1
  }else{
    params.groups_counters[params.group_counter] ++
  }
}

return params.groups_counters

You can call

Debug.explain(variable);

anywhere in the painless script to debug a variable and adjust script for your needs.




回答2:


By default elastic search do not provide any aggregation type which uses some aggregation results to give output of new set of aggregations like SQL has.

  1. One way of doing it is to have terms aggregation and then group them using a script
  2. If you want to achieve this in single query, you can use Scripted Metric Aggregation to achieve this, use below link to get more details :

https://www.elastic.co/guide/en/elasticsearch/reference/current/search-aggregations-metrics-scripted-metric-aggregation.html



来源:https://stackoverflow.com/questions/48413369/can-elasticsearch-aggregations-do-what-sql-can-do

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!