COLLECT_SET() in Hive, keep duplicates?

后端 未结 9 1528
离开以前
离开以前 2020-12-12 17:06

Is there a way to keep the duplicates in a collected set in Hive, or simulate the sort of aggregate collection that Hive provides using some other method? I want to aggregat

9条回答
  •  既然无缘
    2020-12-12 17:42

    Just wondering - if n the statemnent -

    SELECT
        hash_id, COLLECT_LIST(num_of_cats) AS aggr_set
    FROM
        tablename
    WHERE
        blablabla
    GROUP BY
        hash_id
    ;
    

    we want to have sorting and limit the elements for num_of_cats - how to go about it ? COZ in big data we deal with PBs of datas.. we might not need all of that in such cases but top 10 or limit it .

提交回复
热议问题