HowTo run parallel Spark job using Airflow

北战南征 提交于 2019-12-02 03:46:15

问题


We have existing code in production that runs Spark jobs in parallel. We tried to orchestrate some mundane spark jobs using Airflow and we had success BUT now we are not sure how to proceed with spark jobs in parallel.

Can CeleryExecutor help in this case?

Or should we modify our existing Spark job not to run in parallel. I do not like the latter approach personally.

Our existing shell script that has spark job in parallel is something like this and we would like to run this shell script from Airflow:

cat outfile.txt | parallel -k -j2 submitspark {} /data/list

Please suggest.

来源:https://stackoverflow.com/questions/57857780/howto-run-parallel-spark-job-using-airflow

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!