COLLECT_SET() in Hive, keep duplicates?

后端未结

关注

 9  1531

Is there a way to keep the duplicates in a collected set in Hive, or simulate the sort of aggregate collection that Hive provides using some other method? I want to aggregat

相关标签:

9条回答

佛祖请我去吃肉

2020-12-12 17:50

Here is the exact hive query that does this job (works only in hive > 0.13):

SELECT hash_id, collect_set( num_of_cats) FROM GROUP BY hash_id;

0 讨论(0)
发布评论:

提交评论
- 加载中...
滥情空心

2020-12-12 17:57

As of hive 0.13, there is a built-in UDAF called collect_list() that achieves this. See here.

0 讨论(0)
发布评论:

提交评论
- 加载中...
半阙折子戏

2020-12-12 18:01

Check out the Brickhouse collect UDAF ( http://github.com/klout/brickhouse/blob/master/src/main/java/brickhouse/udf/collect/CollectUDAF.java )

It also supports collecting into a map. Brickhouse also contains many useful UDF's not in the standard Hive distribution.

0 讨论(0)
发布评论:

提交评论
- 加载中...

上一页 1 2