How to compute the intersections and unions of two arrays in Hive?

社会主义新天地 提交于 2019-12-24 16:52:25

问题


For example, the intersection

select intersect(array("A","B"), array("B","C"))

should return

["B"]

and the union

 select union(array("A","B"), array("B","C"))

should return

["A","B","C"]

What's the best way to make this in Hive? I have checked the hive documentation, but cannot find any relevant information to do this.


回答1:


Your problem solution is here. Go to the githubLink, there is lot of udfs are created by klout. Download, crate the JAR and add the JAR in the hive. Example

 CREATE TEMPORARY FUNCTION combine AS 'brickhouse.udf.collect.CombineUDF';
 CREATE TEMPORARY FUNCTION combine_unique AS 'brickhouse.udf.collect.CombineUniqueUDAF';

select combine_unique(combine(array('a','b','c'), array('b','c','d'))) from reqtable;

OK
["d","b","c","a"]


来源:https://stackoverflow.com/questions/36145842/how-to-compute-the-intersections-and-unions-of-two-arrays-in-hive

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!