Array intersect Hive

匿名 (未验证) 提交于 2019-12-03 07:50:05

问题:

I have two arrays of string in Hive like

{'value1','value2','value3'} {'value1', 'value2'}

I want to merge arrays without duplicates, result:

{'value1','value2','value3'}

How I can do it in hive?

回答1:

You will need a UDF for this. Klout has a bunch of opensource HivUDFS under the package brickhouse. Here is the github link. They have a bunch of UDF's that exactly serves your purpose. Download,build and add the JAR. Here is an example

CREATE TEMPORARY FUNCTION combine AS 'brickhouse.udf.collect.CombineUDF'; CREATE TEMPORARY FUNCTION combine_unique AS 'brickhouse.udf.collect.CombineUniqueUDAF';  select combine_unique(combine(array('a','b','c'), array('b','c','d'))) from reqtable;  OK ["d","b","c","a"]


回答2:

A native solution could be that:

SELECT id, collect_set(item) FROM table LATERAL VIEW explode(list) lTable AS item GROUP BY id;

Firstly explode with lateralview, and next group by and remove duplicates with collect_set.



转载请标明出处:Array intersect Hive
文章来源: Array intersect Hive
标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!