Explain the aggregate functionality in Spark

前端未结

关注

 9  2212

误落风尘 2020-12-07 12:23

I am looking for some better explanation of the aggregate functionality that is available via spark in python.

The example I have is as follows (using pyspark from

9条回答

醉酒成梦 (楼主)

2020-12-07 12:59

For people looking for Scala Equivalent code for the above example - here it is. Same logic, same input/result.

scala> val listRDD = sc.parallelize(List(1,2,3,4), 2)
listRDD: org.apache.spark.rdd.RDD[Int] = ParallelCollectionRDD[2] at parallelize at :21

scala> listRDD.collect()
res7: Array[Int] = Array(1, 2, 3, 4)

scala> listRDD.aggregate((0,0))((acc, value) => (acc._1+value,acc._2+1),(acc1,acc2) => (acc1._1+acc2._1,acc1._2+acc2._2))
res10: (Int, Int) = (10,4)

0 讨论(0)

查看其它9个回答