Spark : Difference between accumulator and local variable

穿精又带淫゛_ 提交于 2019-12-01 14:29:06

counter is local variable may be is working in your local program .master("local[3]") which will execute on driver. imagine you are running yarn mode. then all the logic will be working in a distributed way your local variable wont be updated (since its local its getting updated) but accumulator will be updated. since its distributed variable. suppose you have 2 executors running the program... one executor will update and another executor can able to see the latest value. In this case your cntAccum is capable of getting latest value from other executors in yarn distributed mode. where as local variable counter cant...

since accumulators are read and write. see docs here.

In the image exeutor id is localhost. if you are using yarn with 2-3 executors it will show executor ids. Hope that helps..

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!