I have a spark pair RDD (key, count) as below
Array[(String, Int)] = Array((a,1), (b,2), (c,1), (d,3))
How to find the key with highest co
For Pyspark:
Let a be the pair RDD with keys as String and values as integers then
a.max(lambda x:x[1])
returns the key value pair with the maximum value. Basically the max function orders by the return value of the lambda function.
Here a is a pair RDD with elements such as ('key',int) and x[1] just refers to the integer part of the element.
Note that the max function by itself will order by key and return the max value.
Documentation is available at https://spark.apache.org/docs/1.5.0/api/python/pyspark.html#pyspark.RDD.max