Spark architecture is entirely revolves around the concept of executors and cores. I would like to see practically how many executors and cores running for my spark applicat
This is an old question, but this is my code for figuring this out on Spark 2.3.0:
+ 414 executor_count = len(spark.sparkContext._jsc.sc().statusTracker().getExecutorInfos()) - 1
+ 415 cores_per_executor = int(spark.sparkContext.getConf().get('spark.executor.cores','1'))
This is python Example to get number of cores (including master's)
def workername():
import socket
return str(socket.gethostname())
anrdd=sc.parallelize(['',''])
namesRDD = anrdd.flatMap(lambda e: (1,workername()))
namesRDD.count()
getExecutorStorageStatus
and getExecutorMemoryStatus
both return the number of executors including driver.
like below example snippet.
/** Method that just returns the current active/registered executors
* excluding the driver.
* @param sc The spark context to retrieve registered executors.
* @return a list of executors each in the form of host:port.
*/
def currentActiveExecutors(sc: SparkContext): Seq[String] = {
val allExecutors = sc.getExecutorMemoryStatus.map(_._1)
val driverHost: String = sc.getConf.get("spark.driver.host")
allExecutors.filter(! _.split(":")(0).equals(driverHost)).toList
}
sc.getConf.getInt("spark.executor.instances", 1)
similarly get all properties and print like below you may get cores information as well..
sc.getConf.getAll.mkString("\n")
OR
sc.getConf.toDebugString
Mostly spark.executor.cores
for executors spark.driver.cores
driver should have this value.
Above methods getExecutorStorageStatus and getExecutorMemoryStatus, In python api were not implemented
EDIT But can be accessed using Py4J bindings exposed from SparkSession.
sc._jsc.sc().getExecutorMemoryStatus()