Is there a way to keep the duplicates in a collected set in Hive, or simulate the sort of aggregate collection that Hive provides using some other method? I want to aggregat
Here is the exact hive query that does this job (works only in hive > 0.13):
SELECT hash_id, collect_set( num_of_cats) FROM GROUP BY hash_id;