COLLECT_SET() in Hive, keep duplicates?

后端 未结 9 1522
离开以前
离开以前 2020-12-12 17:06

Is there a way to keep the duplicates in a collected set in Hive, or simulate the sort of aggregate collection that Hive provides using some other method? I want to aggregat

相关标签:
9条回答
  • 2020-12-12 17:50

    Here is the exact hive query that does this job (works only in hive > 0.13):

    SELECT hash_id, collect_set( num_of_cats) FROM GROUP BY hash_id;

    0 讨论(0)
  • 2020-12-12 17:57

    As of hive 0.13, there is a built-in UDAF called collect_list() that achieves this. See here.

    0 讨论(0)
  • 2020-12-12 18:01

    Check out the Brickhouse collect UDAF ( http://github.com/klout/brickhouse/blob/master/src/main/java/brickhouse/udf/collect/CollectUDAF.java )

    It also supports collecting into a map. Brickhouse also contains many useful UDF's not in the standard Hive distribution.

    0 讨论(0)
提交回复
热议问题