How to sort within partitions (and avoid sort across the partitions) using RDD API?

后端未结

关注

 2  1735

再見小時候 2021-01-02 00:58

It is Hadoop MapReduce shuffle\'s default behavior to sort the shuffle key within partition, but not cross partitions(It is the total ordering that makes keys sorted cross t

2条回答

无人及你 (楼主)

2021-01-02 01:33

I've never had this need before, but my first guess would be to use any of the *Partition* methods (e.g. foreachPartition or mapPartitions) to do the sorting within every partition.

Since they give you a Scala Iterator, you could use it.toSeq and then apply any of the sorting methods of Seq, e.g. sortBy or sortWith or sorted.

0 讨论(0)

查看其它2个回答
发布评论:

提交评论
- 加载中...