How to find max value in pair RDD?

前端 未结 4 1311
攒了一身酷
攒了一身酷 2020-12-01 14:30

I have a spark pair RDD (key, count) as below

Array[(String, Int)] = Array((a,1), (b,2), (c,1), (d,3))

How to find the key with highest co

4条回答
  •  盖世英雄少女心
    2020-12-01 15:10

    For Pyspark:

    Let a be the pair RDD with keys as String and values as integers then

    a.max(lambda x:x[1])
    

    returns the key value pair with the maximum value. Basically the max function orders by the return value of the lambda function.

    Here a is a pair RDD with elements such as ('key',int) and x[1] just refers to the integer part of the element.

    Note that the max function by itself will order by key and return the max value.

    Documentation is available at https://spark.apache.org/docs/1.5.0/api/python/pyspark.html#pyspark.RDD.max

提交回复
热议问题