Is it always the case that Driver must be on a Master node (Yes/No) ? Apache-Spark

℡╲_俬逩灬. 提交于 2019-12-11 05:05:40

问题


Is it always the case that the Driver (as a program that runs the master node) must be on a master node ?

For example, if I setup the ec2 with one master and two workers, does my code that has the main must be executed from the master EC2 instance ?

If answer is NO, what would be the best way to set-up the system where the driver is outside the ec2's master node (lets say, Driver is ran from my computer, while Master and Workers are on EC2)? Do I always have to use the spark-submit, or can I do it from an IDE such as Eclipse or IntelliJ IDEA?

If answer is YES, what would be the best reference to learn more about it (since I need to provide some sort of a proof)?

Thank you kindly for your answer, references would be highly appreciated!


回答1:


No, it doesn't have to be on the master.

Using spark-submit you can use deploy-mode to control how your driver is run (either as a client, on the machine you run submit on (which could be master or another), or as cluster, on the workers).

There is network communication between the workers and the driver so you want it 'close' to the workers, never across the WAN.

You can run from inside a repl (spark-shell) which could be accessed from your IDE. If you're using a dynamic language like Clojure, you can also just create a SparkContext referencing (through master) a local cluster, or the cluster you want to put jobs to, and then code through the repl. In practice it isn't this easy.



来源:https://stackoverflow.com/questions/30022086/is-it-always-the-case-that-driver-must-be-on-a-master-node-yes-no-apache-spa

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!