I\'m trying to use Livy to remotely submit several Spark jobs. Lets say I want to perform following spark-submit task rem
How can I make use of the
SparkSessionthat I created usingPOST/sessionsrequest for submitting mySparkjob usingPOST/batchesrequest?
batch-mode is intended for different use-case than session-mode / LivyClientThe reason I've identified why this isn't possible is (please correct me if I'm wrong / incomplete) as follows
POST/batches request accepts JARSparkSession (or spark-shell) from being re-used (without restarting the SparkSession) because
JAR from previous POST/batches request?JAR from current POST/batches request?And here's a more complete picture
JARsession (obviously) cannot take JARsPySpark: simple python files) that can be loaded into the session (and not JARs)Possible workaround
Spark-application written in Scala / Java, which must be bundled in a JAR, will face this difficulty; Python (PySpark) users are lucky heresession with your JAR via POST/sessions requestclass from your JAR via python (submit POST /sessions/{sessionId}/statements) as many times as you want (with possibly different parameters). While this wouldn't be straight-forward, it sounds very much possibleFinally I found some more alternatives to Livy for remote spark-submit; see this