问题
I am having a hard time converting this simple SQL Query below into Druid:
SELECT country, city, Count(*)
FROM people_data
WHERE name="Mary"
GROUP BY country, city;
So I came up with this query so far:
{
"queryType": "groupBy",
"dataSource" : "people_data",
"granularity": "all",
"metric" : "num_of_pages",
"dimensions": ["country", "city"],
"filter" : {
"type" : "and",
"fields" : [
{
"type": "in",
"dimension": "name",
"values": ["Mary"]
},
{
"type" : "javascript",
"dimension" : "email",
"function" : "function(value) { return (value.length !== 0) }"
}
]
},
"aggregations": [
{ "type": "longSum", "name": "num_of_pages", "fieldName": "count" }
],
"intervals": [ "2016-07-20/2016-07-21" ]
}
The query above runs but it doesn't seem like groupBy in the Druid datasource is even being evaluated since I see people in my output with names other than Mary. Does anyone have any input on how to make this work?
回答1:
Simple answer is that you cannot select arbitrary dimensions in your groupBy
queries.
Strictly speaking even SQL query does not make sense. If for a given combination of country, city
there are many different values of name
and street
, then how do you squeeze that into a single row? You have to aggregate them, e.g. by using max
function.
In this case you can include the same column in your data as both dimension and metric, e.g. name_dim
and name_metric
, and include corresponding aggregation over your metric, max(name_metric)
.
Please note, that if these columns, name
etc, have high granularity values, then that will kill Druid's roll-up feature.
来源:https://stackoverflow.com/questions/38546593/how-to-perform-a-select-in-the-results-returned-from-a-group-by-druid