问题
I have some data that I want to group by on multiple columns, perform an aggregation function on, and then transpose into different columns using Hive.
For example, given this input
Input:
hr type value
01 a 10
01 b 20
01 c 50
01 a 30
02 c 10
02 b 90
02 a 80
I want to produce this output:
Output:
hr a_avg b_avg c_avg
01 20 20 50
02 80 90 10
Where there is one distinct column for each distinct type
in my input. a_avg
corresponds to the average a
value for each hour.
How can I do this in Hive? I am guessing I might need to make use of https://github.com/klout/brickhouse/wiki/Collect-UDFs
So far the best I can think of is to use multiple group-by clauses, but that won't transpose the data into multiple columns.
Any ideas?
回答1:
You don't necessarily need to use Brickhouse, but it will definitely make it easier. Here is what I'm thinking, something like
select hr
, type_map['a'] a_avg
, type_map['b'] b_avg
, type_map['c'] c_avg
from (
select hr
, collect(type, avg_value) type_map -- Brickhouse collect; creates a map
from (
select hr
, type
, avg( value ) avg_value
from db.table
group by hr, type ) x
group by hr ) y
来源:https://stackoverflow.com/questions/30109551/how-to-group-by-multiple-columns-and-then-transpose-in-hive