Hive Explode / Lateral View multiple arrays

本小妞迷上赌 提交于 2019-11-27 03:51:54

You can use the numeric_range and array_index UDFs from Brickhouse ( http://github.com/klout/brickhouse ) to solve this problem. There is an informative blog posting describing in detail over at http://brickhouseconfessions.wordpress.com/2013/03/07/exploding-multiple-arrays-at-the-same-time-with-numeric_range/

Using those UDFs, the query would be something like

select cookie,
   array_index( product_id_arr, n ) as product_id,
   array_index( catalog_id_arr, n ) as catalog_id,
   array_index( qty_id_arr, n ) as qty
from table
lateral view numeric_range( size( product_id_arr )) n1 as n;

I found a very good solution to this problem without using any UDF, posexplode is a very good solution :

SELECT COOKIE ,
ePRODUCT_ID,
eCAT_ID,
eQTY
FROM TABLE 
LATERAL VIEW posexplode(PRODUCT_ID) ePRODUCT_IDAS seqp, ePRODUCT_ID
LATERAL VIEW posexplode(CAT_ID) eCAT_ID AS seqc, eCAT_ID
LATERAL VIEW posexplode(QTY) eQTY AS seqq, eDateReported
WHERE seqp = seqc AND seqc = seqq;

You can do this by using posexplode, which will provide an integer between 0 and n to indicate the position in the array for each element in the array. Then use this integer - call it pos (for position) to get the matching values in other arrays, using block notation, like this:

select 
  cookie, 
  n.pos as position, 
  n.prd_id as product_id,
  cat_id[pos] as catalog_id,
  qty[pos] as qty
from table
lateral view posexplode(product_id_arr) n as pos, prd_id;

This avoids the using imported UDF's as well as joining various arrays together (this has much better performance).

Lakshman Purihella

I tried to work out on your scenario... please try this code -

create table info(cookie string,productid int,catid string,qty string);

insert into table info
select cookie,productid[myprod],categoryid[mycat],qty[myqty] from table
lateral view posexplode(productid) pro as myprod,pro
lateral view posexplode(categoryid) cate as mycat,cate
lateral view posexplode(qty) q as myqty,q
where myprod=mycat and mycat=myqty;

Note - In the above statements, if you place - select cookie,myprod,mycat,myqty from table in place of select cookie,productid[myprod],categoryid[mycat],qty[myqty] from table in the output you will get the index of the element in the array of productid, categoryid and qty. Hope this will be helpful.

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!