Execute bash script on a dataproc cluster from a composer

Deadly 提交于 2021-01-27 22:55:18

问题


I wanted to add jars to a dataproc cluster in a specific location once the cluster has been created using a simple shell script.

I would like to automate this step to run from a composer once the dataproc cluster has been created,the next step is to execute bash script which would add the jars to the data proc cluster.

Can you suggest which airflow operator to use to execute bash scripts on the dataproc cluster?


回答1:


For running a simple shell script on the master node, the easiest way would be to use a pig sh Dataproc job, such as the following:

gcloud dataproc jobs submit pig --cluster ${CLUSTER} --execute 'sh echo hello world'

or to use pig fs to copy the jarfile directly:

gcloud dataproc jobs submit pig --cluster ${CLUSTER} --execute 'fs -cp gs://foo/my_jarfile.jar file:///tmp/localjar.jar'

The equivalent Airflow operator setup for those gcloud commands would be using the DataProcPigOperator with the query string param.

If you need to place jarfiles on all the nodes, it's better to just use an initialization action to copy the jarfiles at cluster startup time:

#!/bin/bash
# copy-jars.sh

gsutil cp gs://foo/my-jarfile.jar /tmp/localjar.jar

If you need to dynamically determine what jarfiles to copy onto all nodes sometime after the cluster has already been deployed, you could take the approach described here to use an initialization action which continuously watches some hdfs directory for jarfiles to copy to a local directory, and then whenever you need a jarfile to appear on all the nodes there, you could just submit a pig fs job to place the jarfile from GCS into HDFS in the watched directory.

Generally you don't want something to automatically poll on GCS itself because GCS list requests cost money, whereas it's no extra cost to poll your Dataproc cluster's HDFS.



来源:https://stackoverflow.com/questions/56034252/execute-bash-script-on-a-dataproc-cluster-from-a-composer

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!