问题
In Elasticsearch I need get the frequency and the number of colors that occur the most frequent from the highest to lowest. If I have data like this:
id|name
----------
1|blue
----------
2|blue
----------
3|green
----------
4|yellow
----------
5|blue
----------
6|yellow
----------
7|purple
----------
8|purple
----------
9|purple
I need to get the count of each color and then group by the count. So in the end, I would like all the colors that occurs the same number of times to be inside one group. This is how I would do it in sql.
select
count(*) as 'Number of Colors',
i.c as 'Seen times'
from
(
select
name as 'n',
count(*) as 'c'
from
colors
group by name
) i
group by i.c
order by i.c desc;
This would return:
Number of Colors | Seen times
------------------------------
2 | 3
------------------------------
1 | 2
------------------------------
1 | 1
How would I write it in Elasticsearch query? I am using version 5.5.
回答1:
You can use scripted_metric aggregation with painless script
Query
I will explain it in more details below.
POST index/type/_search
{
"size": 0,
"aggs": {
"colorgroups": {
"scripted_metric": {
"init_script" : "params._agg.transactions = [:]",
"map_script": "params.key = doc['colorname'].value; if(params._agg.transactions[params.key] == null){ params._agg.transactions[params.key] = 1; }else{ params._agg.transactions[params.key] ++ }",
"combine_script": "return params._agg.transactions;",
"reduce_script": "params.color_counters =[:]; params.groups_counters =[:]; for(shard_result in params._aggs){ for(color_name in shard_result.keySet()){ if(params.color_counters[color_name] == null){ params.color_counters[color_name] = shard_result[color_name] }else{ params.color_counters[color_name] = params.color_counters[color_name] + shard_result[color_name] } } } for(color_name in params.color_counters.keySet()){ params.group_counter = params.color_counters[color_name].toString(); if(params.groups_counters[params.group_counter] == null){ params.groups_counters[params.group_counter] = 1 }else{ params.groups_counters[params.group_counter] ++ } } return params.groups_counters"
}
}
}
}
Result
{
"took": 1,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"skipped": 0,
"failed": 0
},
"hits": {
"total": 9,
"max_score": 0,
"hits": []
},
"aggregations": {
"colorgroups": {
"value": {
"2": 3,
"1": 2,
"1": 1
}
}
}
}
init_script
init some values to keep intermediate results
params._agg.transactions = [:]
map_script
Calculations for each document. Try to keep it small and reduce as much data as possible on this step.
params.key = doc['colorname'].value;
if(params._agg.transactions[params.key] == null){
params._agg.transactions[params.key] = 1;
}else{
params._agg.transactions[params.key] ++
}
combine_script
Calculations for each shard. We already did everything in map_script, just return hash map
return params._agg.transactions
reduce_script
Working with partial aggregations from each shard. We need to merge them into one HashMap first. And group by counter value after
params.color_counters =[:];
params.groups_counters =[:];
//merging all partial aggregations to params.color_counters
for(shard_result in params._aggs){
for(color_name in shard_result.keySet()){
if(params.color_counters[color_name] == null){
params.color_counters[color_name] = shard_result[color_name]
}else{
params.color_counters[color_name] = params.color_counters[color_name] + shard_result[color_name]
}
}
}
//Grouping by color counter to params.groups_counters
for(color_name in params.color_counters.keySet()){
params.group_counter = params.color_counters[color_name].toString();
if(params.groups_counters[params.group_counter] == null){
params.groups_counters[params.group_counter] = 1
}else{
params.groups_counters[params.group_counter] ++
}
}
return params.groups_counters
You can call
Debug.explain(variable);
anywhere in the painless script to debug a variable and adjust script for your needs.
回答2:
By default elastic search do not provide any aggregation type which uses some aggregation results to give output of new set of aggregations like SQL has.
- One way of doing it is to have terms aggregation and then group them using a script
- If you want to achieve this in single query, you can use Scripted Metric Aggregation to achieve this, use below link to get more details :
https://www.elastic.co/guide/en/elasticsearch/reference/current/search-aggregations-metrics-scripted-metric-aggregation.html
来源:https://stackoverflow.com/questions/48413369/can-elasticsearch-aggregations-do-what-sql-can-do