launching a spark program using oozie workflow

前端 未结 1 641
余生分开走
余生分开走 2020-12-16 05:49

I am working with a scala program using spark packages. Currently I run the program using the bash command from the gateway: /homes/spark/bin/spark-submit --master yarn-clus

1条回答
  •  情书的邮戳
    2020-12-16 06:05

    If you have a new enough version of oozie you can use oozie's spark task:

    https://github.com/apache/oozie/blob/master/client/src/main/resources/spark-action-0.1.xsd

    Otherwise you need to execute a java task that will call spark. Something like:

       
            org.apache.spark.deploy.SparkSubmit
    
            --class
            ${spark_main_class} -> this is the class com.xxx.yyy.zzz
    
            --deploy-mode
            cluster
    
            --master
            yarn
    
            --queue
            ${queue_name} -> depends on your oozie config
    
            --num-executors
            ${spark_num_executors}
    
            --executor-cores
            ${spark_executor_cores}
    
            ${spark_app_file} -> jar that contains your spark job, written in scala
    
            ${input} -> some arg 
            ${output}-> some other arg
    
            ${spark_app_file}
    
            ${name_node}/user/spark/share/lib/spark-assembly.jar
        
    

    0 讨论(0)
提交回复
热议问题