问题
I create a table in HIVE. It has the following columns:
id bigint, rank bigint, date string
I want to get avg(rank) per month. I can use this command. It works.
select a.lens_id, avg(a.rank)
from tableA a
group by a.lens_id, year(a.date_saved), month(a.date_saved);
However, I also want to get date information. I use this command:
select a.lens_id, avg(a.rank), a.date_saved
from lensrank_archive a
group by a.lens_id, year(a.date_saved), month(a.date_saved);
It complains: Expression Not In Group By Key
回答1:
The full error message should be in the format Expression Not In Group By Key [value]
.
The [value]
will tell you what expression needs to be in the Group By
.
Just looking at the two queries, I'd say that you need to add a.date_saved
explicitly to the Group By
.
回答2:
A walk around is to put the additional field in a collect_set and return the first element of the set. For example
select a.lens_id, avg(a.rank), collect_set(a.date_saved)[0]
from lensrank_archive a
group by a.lens_id, year(a.date_saved), month(a.date_saved);
回答3:
This is because there is more than one ‘date_saved’ record under your group by. You can turn these ‘date_saved’ records into arrays and output them.
来源:https://stackoverflow.com/questions/5746687/hive-expression-not-in-group-by-key