How are Spark Executors launched if Spark (on YARN) is not installed on the worker nodes?

懵懂的女人 提交于 2019-12-11 08:07:40

问题


I have a question regarding Apache Spark running on YARN in cluster mode. According to this thread, Spark itself does not have to be installed on every (worker) node in the cluster. My problem is with the Spark Executors: In general, YARN or rather the Resource Manager is supposed to decide about resource allocation. Hence, Spark Executors could be launched randomly on any (worker) node in the cluster. But then, how can Spark Executors be launched by YARN if Spark is not installed on any (worker) node?


回答1:


In a high level, When Spark application launched on YARN,

  1. An Application Master(Spark specific) will be created in one of the YARN Container.
  2. Other YARN Containers used for Spark workers(Executors)

Spark driver will pass serialized actions(code) to executors to process data.

spark-assembly provides spark related jars to run Spark jobs on a YARN cluster and application will have its own functional related jars.


Edit: (2017-01-04)

Spark 2.0 no longer requires a fat assembly jar for production deployment.source



来源:https://stackoverflow.com/questions/41180808/how-are-spark-executors-launched-if-spark-on-yarn-is-not-installed-on-the-work

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!