I am trying to submit a JAR with Spark job into the YARN cluster from Java code. I am using SparkLauncher to submit SparkPi example:
Process spark = new Spar
Since this is an old post, i would like to add an update that might help whom ever read this post after. In spark 1.6.0 there are some added functions in SparkLauncher class. Which is:
def startApplication(listeners: [Listener]): SparkAppHandle
http://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.launcher.SparkLauncher
You can run the application with out the need for additional threads for the stdout and stderr handling plush there is a nice status reporting of the application running. Use this code:
val env = Map(
"HADOOP_CONF_DIR" -> hadoopConfDir,
"YARN_CONF_DIR" -> yarnConfDir
)
val handler = new SparkLauncher(env.asJava)
.setSparkHome(sparkHome)
.setAppResource("Jar/location/.jar")
.setMainClass("path.to.the.main.class")
.setMaster("yarn-client")
.setConf("spark.app.id", "AppID if you have one")
.setConf("spark.driver.memory", "8g")
.setConf("spark.akka.frameSize", "200")
.setConf("spark.executor.memory", "2g")
.setConf("spark.executor.instances", "32")
.setConf("spark.executor.cores", "32")
.setConf("spark.default.parallelism", "100")
.setConf("spark.driver.allowMultipleContexts","true")
.setVerbose(true)
.startApplication()
println(handle.getAppId)
println(handle.getState)
You can keep enquering the state if the spark application until it give success. For information about how the Spark Launcher server works in 1.6.0. see this link: https://github.com/apache/spark/blob/v1.6.0/launcher/src/main/java/org/apache/spark/launcher/LauncherServer.java