发表新帖

发表新帖

differentiate driver code and work code in Apache Spark

前端未结

关注

 2  1687

迷失自我 2021-01-05 05:11

In Apache Spark program how do we know which part of code will execute in driver program and which part of code will execute in worker nodes?

With Regards

2条回答

Happy的楠姐 (楼主)

2021-01-05 05:42

It is actually pretty simple. Everything that happens inside the closure created by a transformation happens on a worker. It means if something is passed inside map(...), filter(...), mapPartitions(...), groupBy*(...), aggregateBy*(...) is executed on the workers. It includes reading data from a persistent storage or remote sources.

Actions like count, reduce(...), fold(...) are usually executed on both driver and workers. Heavy lifting is performed in parallel by the workers and some final steps, like reducing outputs received from the workers, is performed sequentially on the driver.

Everything else, like triggering an action or transformation happens on the driver. In particular it means every action which requires access to SparkContext. In PySpark it means also a communication with Py4j gateway.

0 讨论(0)

查看其它2个回答
发布评论:

提交评论
- 加载中...

热议问题