COLLECT_SET() in Hive, keep duplicates?

后端未结

关注

 9  1528

离开以前 2020-12-12 17:06

Is there a way to keep the duplicates in a collected set in Hive, or simulate the sort of aggregate collection that Hive provides using some other method? I want to aggregat

9条回答

既然无缘 (楼主)

2020-12-12 17:42
Just wondering - if n the statemnent -
```
SELECT
    hash_id, COLLECT_LIST(num_of_cats) AS aggr_set
FROM
    tablename
WHERE
    blablabla
GROUP BY
    hash_id
;
```
we want to have sorting and limit the elements for num_of_cats - how to go about it ? COZ in big data we deal with PBs of datas.. we might not need all of that in such cases but top 10 or limit it .
0 讨论(0)

查看其它9个回答
发布评论:

提交评论
- 加载中...