How to group by multiple columns and then transpose in Hive

问题

I have some data that I want to group by on multiple columns, perform an aggregation function on, and then transpose into different columns using Hive.

For example, given this input

Input:

hr  type value
01  a    10
01  b    20
01  c    50
01  a    30
02  c    10
02  b    90
02  a    80

I want to produce this output:

Output:

hr  a_avg  b_avg  c_avg
01  20     20     50
02  80     90     10

Where there is one distinct column for each distinct type in my input. a_avg corresponds to the average a value for each hour.

How can I do this in Hive? I am guessing I might need to make use of https://github.com/klout/brickhouse/wiki/Collect-UDFs

So far the best I can think of is to use multiple group-by clauses, but that won't transpose the data into multiple columns.

Any ideas?

回答1:

You don't necessarily need to use Brickhouse, but it will definitely make it easier. Here is what I'm thinking, something like

select hr
  , type_map['a'] a_avg
  , type_map['b'] b_avg
  , type_map['c'] c_avg
from (
  select hr
    , collect(type, avg_value) type_map -- Brickhouse collect; creates a map
  from (
    select hr
      , type
      , avg( value ) avg_value
    from db.table
    group by hr, type ) x
  group by hr ) y

来源：https://stackoverflow.com/questions/30109551/how-to-group-by-multiple-columns-and-then-transpose-in-hive

标签

Hadoop

Hive

data-analysis

易学教程内所有资源均来自网络或用户发布的内容，如有违反法律规定的内容欢迎反馈！
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!