问题
Does there any difference or priority between specifying spark application configuration in the code :
SparkConf().setMaster(yarn)
and specifying them in command line
spark-submit --master yarn
回答1:
Yes, the highest priority is given to the configuration in the user's code with the set() function. After that there the flags passed with spark-submit.
Properties set directly on the SparkConf take highest precedence, then flags passed to spark-submit or spark-shell, then options in the spark-defaults.conf file. A few configuration keys have been renamed since earlier versions of Spark; in such cases, the older key names are still accepted, but take lower precedence than any instance of the newer key.
Source
回答2:
There are 4 precedence level: (1 to 4 , 1 being the highest priority):
- SparkConf set in the application
- Properties given with the spark-submit
- Properties can be given in a property file. And the property file can be given as argument while submission
- Default values
回答3:
Other than the priority, specifying it on a command-line would allow you to run on different cluster managers without modifying code. The same application can be run on local[n] or yarn or mesos or spark standalone cluster.
来源:https://stackoverflow.com/questions/36885680/spark-configuration-priority