问题
I have an RStudio driver instance which is connected to a Spark Cluster. I wanted to know if there is any way to actually connect to Spark cluster from RStudio using an external configuration file which can specify the number of executors, memory and other spark parameters. I know we can do it using the below command
sparkR.session(sparkConfig = list(spark.cores.max='2',spark.executor.memory = '8g'))
I am specifically looking for a method which takes spark parameters from an external file to start the sparkR session.
回答1:
Spark uses standardized configuration layout with spark-defaults.conf
used for specifying configuration option. This file should be located in one of the following directories:
SPARK_HOME/conf
SPARK_CONF_DIR
All you have to do is to configure SPARK_HOME
or SPARK_CONF_DIR
environment variables and put configuration there.
Each Spark installation comes with template files you can use as an inspiration.
来源:https://stackoverflow.com/questions/49805162/starting-sparkr-session-using-external-config-file