Is there a way to transpose data in Hive?

前端 未结 2 695
天命终不由人
天命终不由人 2020-12-06 21:43

Can data in Hive be transposed? As in, the rows become columns and columns are the rows? If there is no function straight up, is there a way to do it in a couple of steps?

2条回答
  •  天命终不由人
    2020-12-06 21:55

    As Mark pointed out there's no easy way to do this in Hive since PIVOT doesn't present in Hive and you may also encounter issues when trying to use the case/when 'trick' since you have multiple values (proc1,proc2,proc3).

    As for testing purposes, you may try a different approach:

    select v, o1, o2, o3 from (
      select k, 
             v,
             LEAD(v,3) OVER() as o1,
             LEAD(v,6) OVER() as o2,
             LEAD(v,9) OVER() as o3
      from (select transform(name,proc1,proc2,proc3) using 'python strm.py' AS (k, v) 
        from input_table) q1
    ) q2 where k = 'A1';
    

    where strm.py:

    import sys
    
    for line in sys.stdin:
      line = line.strip()
      name, proc1, proc2, proc3 = line.split('\t')
      print '%s\t%s' % (name, proc1)
      print '%s\t%s' % (name, proc2)
      print '%s\t%s' % (name, proc3)
    

    The trick here is to use a python script in the map phase which emits each column of a row as distinct rows. Then every third (since we have 3 proc columns) row will form the resulting row which we get by peeking forward (lead).

    However, this query does the job, it has the drawback that as the input grows, you need to peek the next 3rd element in the query which may lead to performance hit. Anyway you may evaluate it for testing purposes.

提交回复
热议问题