launching a spark program using oozie workflow

前端 未结 1 633
余生分开走
余生分开走 2020-12-16 05:49

I am working with a scala program using spark packages. Currently I run the program using the bash command from the gateway: /homes/spark/bin/spark-submit --master yarn-clus

相关标签:
1条回答
  • 2020-12-16 06:05

    If you have a new enough version of oozie you can use oozie's spark task:

    https://github.com/apache/oozie/blob/master/client/src/main/resources/spark-action-0.1.xsd

    Otherwise you need to execute a java task that will call spark. Something like:

       <java>
            <main-class>org.apache.spark.deploy.SparkSubmit</main-class>
    
            <arg>--class</arg>
            <arg>${spark_main_class}</arg> -> this is the class com.xxx.yyy.zzz
    
            <arg>--deploy-mode</arg>
            <arg>cluster</arg>
    
            <arg>--master</arg>
            <arg>yarn</arg>
    
            <arg>--queue</arg>
            <arg>${queue_name}</arg> -> depends on your oozie config
    
            <arg>--num-executors</arg>
            <arg>${spark_num_executors}</arg>
    
            <arg>--executor-cores</arg>
            <arg>${spark_executor_cores}</arg>
    
            <arg>${spark_app_file}</arg> -> jar that contains your spark job, written in scala
    
            <arg>${input}</arg> -> some arg 
            <arg>${output}</arg>-> some other arg
    
            <file>${spark_app_file}</file>
    
            <file>${name_node}/user/spark/share/lib/spark-assembly.jar</file>
        </java>
    
    0 讨论(0)
提交回复
热议问题