Is there a way to keep the duplicates in a collected set in Hive, or simulate the sort of aggregate collection that Hive provides using some other method? I want to aggregat
Just wondering - if n the statemnent -
SELECT
hash_id, COLLECT_LIST(num_of_cats) AS aggr_set
FROM
tablename
WHERE
blablabla
GROUP BY
hash_id
;
we want to have sorting and limit the elements for num_of_cats - how to go about it ? COZ in big data we deal with PBs of datas.. we might not need all of that in such cases but top 10 or limit it .