How to perform a SELECT in the results returned from a GROUP BY Druid?

我的未来我决定 提交于 2019-12-04 04:46:43

问题


I am having a hard time converting this simple SQL Query below into Druid:

SELECT country, city, Count(*) 
FROM people_data 
WHERE name="Mary" 
GROUP BY country, city;

So I came up with this query so far:

{
  "queryType": "groupBy",
  "dataSource" : "people_data",
  "granularity": "all",
  "metric" : "num_of_pages",
  "dimensions": ["country", "city"],
  "filter" : {
      "type" : "and",
      "fields" : [
          {
            "type": "in",
            "dimension": "name",
            "values": ["Mary"]
          },
          {
            "type" : "javascript",
            "dimension" : "email",
            "function" : "function(value) { return (value.length !== 0) }"
          }
      ]
  },
  "aggregations": [

    { "type": "longSum", "name": "num_of_pages", "fieldName": "count" }
  ],
  "intervals": [ "2016-07-20/2016-07-21" ]
}

The query above runs but it doesn't seem like groupBy in the Druid datasource is even being evaluated since I see people in my output with names other than Mary. Does anyone have any input on how to make this work?


回答1:


Simple answer is that you cannot select arbitrary dimensions in your groupBy queries.

Strictly speaking even SQL query does not make sense. If for a given combination of country, city there are many different values of name and street, then how do you squeeze that into a single row? You have to aggregate them, e.g. by using max function.

In this case you can include the same column in your data as both dimension and metric, e.g. name_dim and name_metric, and include corresponding aggregation over your metric, max(name_metric).

Please note, that if these columns, name etc, have high granularity values, then that will kill Druid's roll-up feature.



来源:https://stackoverflow.com/questions/38546593/how-to-perform-a-select-in-the-results-returned-from-a-group-by-druid

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!