How to get array/bag of elements from Hive group by operator?

£可爱£侵袭症+ 提交于 2019-11-30 17:35:22
Daniel Koverman

The built in aggregate function collect_set (doumented here) gets you almost what you want. It would actually work on your example input:

SELECT F1, collect_set(F2)
FROM sample_table
GROUP BY F1

Unfortunately, it also removes duplicate elements and I imagine this isn't your desired behavior. I find it odd that collect_set exists, but no version to keep duplicates. Someone else apparently thought the same thing. It looks like the top and second answer there will give you the UDAF you need.

collect_set actually works as expected since a set as per definition is a collection of well defined and distinct objects i.e. objects occur exactly once or not at all within a set.

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!