Why do you need to include a field in GROUP BY when using OVER (PARTITION BY x)?

流过昼夜 提交于 2019-12-10 15:05:25

问题


I have a table for which I want to do a simple sum of a field, grouped by two columns. I then want the total for all values for each year_num.

See example: http://rextester.com/QSLRS68794

This query is throwing: "42803: column "foo.num_cust" must appear in the GROUP BY clause or be used in an aggregate function", and I cannot figure out why. Why would an aggregate function using the OVER (PARTITION BY x) require the summed field to be in GROUP BY??

select 
    year_num
    ,age_bucket
    ,sum(num_cust)
    --,sum(num_cust) over (partition by year_num)  --THROWS ERROR!!
from
    foo
group by
    year_num
    ,age_bucket
order by 1,2

TABLE:

| loc_id |  year_num |  gen |  cust_category |  cust_age |  num_cust |  age_bucket |
|--------|-----------|------|----------------|-----------|-----------|-------------|
| 1      | 2016      | M    | cash           | 41        | 2         | 04_<45      |
| 1      | 2016      | F    | Prepaid        | 41        | 1         | 03_<35      |
| 1      | 2016      | F    | cc             | 61        | 1         | 05_45+      |
| 1      | 2016      | F    | cc             | 19        | 2         | 02_<25      |
| 1      | 2016      | M    | cc             | 64        | 1         | 05_45+      |
| 1      | 2016      | F    | cash           | 46        | 1         | 05_45+      |
| 1      | 2016      | F    | cash           | 27        | 3         | 03_<35      |
| 1      | 2016      | M    | cash           | 42        | 1         | 04_<45      |
| 1      | 2017      | F    | cc             | 35        | 1         | 04_<45      |
| 1      | 2017      | F    | cc             | 37        | 1         | 04_<45      |
| 1      | 2017      | F    | cash           | 46        | 1         | 05_45+      |
| 1      | 2016      | F    | cash           | 19        | 4         | 02_<25      |
| 1      | 2017      | M    | cash           | 43        | 1         | 04_<45      |
| 1      | 2017      | M    | cash           | 29        | 1         | 03_<35      |
| 1      | 2016      | F    | cc             | 13        | 1         | 01_<18      |
| 1      | 2017      | F    | cash           | 16        | 2         | 01_<18      |
| 1      | 2016      | F    | cc             | 17        | 2         | 01_<18      |
| 1      | 2016      | M    | cc             | 17        | 2         | 01_<18      |
| 1      | 2017      | F    | cash           | 18        | 9         | 02_<25      |

DESIRED OUTPUT:

| year_num | age_bucket | sum | sum over (year_num) |
|----------|------------|-----|---------------------|
| 2016     | 01_<18     | 5   | 21                  |
| 2016     | 02_<25     | 6   | 21                  |
| 2016     | 03_<35     | 4   | 21                  |
| 2016     | 04_<45     | 3   | 21                  |
| 2016     | 05_45+     | 3   | 21                  |
| 2017     | 01_<18     | 2   | 16                  |
| 2017     | 02_<25     | 9   | 16                  |
| 2017     | 03_<35     | 1   | 16                  |
| 2017     | 04_<45     | 3   | 16                  |
| 2017     | 05_45+     | 1   | 16                  |

回答1:


You need to nest the sum()s:

select year_num, age_bucket, sum(num_cust),
       sum(sum(num_cust)) over (partition by year_num)  --WORKS!!
from foo
group by year_num, age_bucket
order by 1, 2;

Why? Well, the window function is not doing aggregation. The argument needs to be an expression that can be evaluated after the group by (because this is an aggregation query). Because num_cust is not a group by key, it needs an aggregation function.

Perhaps this is clearer if you used a subquery:

select year_num, age_bucket, sum_num_cust,
       sum(sum_num_cust) over (partition by year_num)
from (select year_num, age_bucket, sum(num_cust) as sum_num_cust
      from foo
      group by year_num, age_bucket
     ) ya
order by 1, 2;

These two queries do exactly the same thing. But with the subquery it should be more obvious why you need the extra aggregation.



来源:https://stackoverflow.com/questions/46612628/why-do-you-need-to-include-a-field-in-group-by-when-using-over-partition-by-x

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!