How to find max value in pair RDD?

前端 未结 4 1270
攒了一身酷
攒了一身酷 2020-12-01 14:30

I have a spark pair RDD (key, count) as below

Array[(String, Int)] = Array((a,1), (b,2), (c,1), (d,3))

How to find the key with highest co

4条回答
  •  失恋的感觉
    2020-12-01 15:02

    Use takeOrdered(1)(Ordering[Int].reverse.on(_._2)):

    val a = Array(("a",1), ("b",2), ("c",1), ("d",3))
    val rdd = sc.parallelize(a)
    val maxKey = rdd.takeOrdered(1)(Ordering[Int].reverse.on(_._2))
    // maxKey: Array[(String, Int)] = Array((d,3))
    

    Quoting the note from RDD.takeOrdered:

    This method should only be used if the resulting array is expected to be small, as all the data is loaded into the driver's memory.

提交回复
热议问题