I am working with a scala program using spark packages. Currently I run the program using the bash command from the gateway: /homes/spark/bin/spark-submit --master yarn-clus
If you have a new enough version of oozie you can use oozie's spark task:
https://github.com/apache/oozie/blob/master/client/src/main/resources/spark-action-0.1.xsd
Otherwise you need to execute a java task that will call spark. Something like:
org.apache.spark.deploy.SparkSubmit
--class
${spark_main_class} -> this is the class com.xxx.yyy.zzz
--deploy-mode
cluster
--master
yarn
--queue
${queue_name} -> depends on your oozie config
--num-executors
${spark_num_executors}
--executor-cores
${spark_executor_cores}
${spark_app_file} -> jar that contains your spark job, written in scala
${input} -> some arg
${output} -> some other arg
${spark_app_file}
${name_node}/user/spark/share/lib/spark-assembly.jar