reduceByKey: How does it work internally?

前端 未结 4 1550
Happy的楠姐
Happy的楠姐 2020-12-04 09:22

I am new to Spark and Scala. I was confused about the way reduceByKey function works in Spark. Suppose we have the following code:

val lines = sc.textFile(\"         


        
4条回答
  •  执念已碎
    2020-12-04 10:02

    Spark RDD reduceByKey function merges the values for each key using an associative reduce function.

    The reduceByKey function works only on the RDDs and this is a transformation operation that means it is lazily evaluated. And an associative function is passed as a parameter, which is applied to source RDD and creates a new RDD as a result.

    So in your example, rdd pairs has a set of multiple paired elements like (s1,1), (s2,1) etc. And reduceByKey accepts a function (accumulator, n) => (accumulator + n), which initialise the accumulator variable to default value 0 and adds up the element for each key and return the result rdd counts having the total counts paired with key.

提交回复
热议问题