Spark: Accumulators does not work properly when I use it in Range

不想你离开。 提交于 2021-02-07 10:10:29

问题


I don't understand why my accumulator hasn't been updated properly by Spark.

object AccumulatorsExample extends App {
  val acc = sc.accumulator(0L, "acc")
  sc range(0, 20000, step = 25) map { _ => acc += 1 } count()
  assert(acc.value == 800) // not equals
}

My Spark config:

setMaster("local[*]") // should use 8 cpu cores

I'm not sure if Spark distribute computations of accumulator on every core and maybe that's the problem.

My question is how can I aggregate all acc values in one single sum and get the right accumulator value (800)?

PS

If I restrict core number setMaster("local[1]") than all works fine.


回答1:


There are two different issues here:

  • You are extending App instead of implementing main method. There are some known issues related to this approach including incorrect accumulator behavior and because of that it shouldn't be used in Spark applications. This is most likely the source of the problem.

    See for example SPARK-4170 for other possible issues related to extending App.

  • You are using accumulators inside transformations. It means that accumulator can incremented arbitrary number of times (at least once when given job is successful).

    In general you require exact results you should use accumulators only inside actions like foreach and foreachPartition although it it rather unlikely you'll experience any issues in a toy application like this.



来源:https://stackoverflow.com/questions/38422099/spark-accumulators-does-not-work-properly-when-i-use-it-in-range

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!