I use Spark 1.3.0 in a cluster of 5 worker nodes with 36 cores and 58GB of memory each. I\'d like to configure Spark\'s Standalone cluster with many executors per worker.
You first need to configure your spark standalone cluster, then set the amount of resources needed for each individual spark application you want to run.
In order to configure the cluster, you can try this:
In conf/spark-env.sh:
SPARK_WORKER_INSTANCES = 10 which determines the number of Worker instances (#Executors) per node (its default value is only 1)SPARK_WORKER_CORES = 15 # number of cores that one Worker can use (default: all cores, your case is 36)SPARK_WORKER_MEMORY = 55g # total amount of memory that can be used on one machine (Worker Node) for running Spark programs.Copy this configuration file to all Worker Nodes, on the same folder
sbin (sbin/start-all.sh, ...)As you have 5 workers, with the above configuration you should see 5 (workers) * 10 (executors per worker) = 50 alive executors on the master's web interface (http://localhost:8080 by default)
When you run an application in standalone mode, by default, it will acquire all available Executors in the cluster. You need to explicitly set the amount of resources for running this application: Eg:
val conf = new SparkConf()
.setMaster(...)
.setAppName(...)
.set("spark.executor.memory", "2g")
.set("spark.cores.max", "10")