Explain the aggregate functionality in Spark

前端 未结 9 2212
误落风尘
误落风尘 2020-12-07 12:23

I am looking for some better explanation of the aggregate functionality that is available via spark in python.

The example I have is as follows (using pyspark from

9条回答
  •  醉酒成梦
    2020-12-07 12:59

    For people looking for Scala Equivalent code for the above example - here it is. Same logic, same input/result.

    scala> val listRDD = sc.parallelize(List(1,2,3,4), 2)
    listRDD: org.apache.spark.rdd.RDD[Int] = ParallelCollectionRDD[2] at parallelize at :21
    
    scala> listRDD.collect()
    res7: Array[Int] = Array(1, 2, 3, 4)
    
    scala> listRDD.aggregate((0,0))((acc, value) => (acc._1+value,acc._2+1),(acc1,acc2) => (acc1._1+acc2._1,acc1._2+acc2._2))
    res10: (Int, Int) = (10,4)
    

提交回复
热议问题