Workflow orchestration for Google Dataflow

有些话、适合烂在心里 提交于 2019-12-07 09:53:37

问题


We are using Google Dataflow for batch data processing and looking for some options for workflow orchestration tools something similar to what Azkaban does for Hadoop.

Key things things that we are looking for are,

  • Configuring workflows
  • Scheduling workflows
  • Monitoring and alerting failed workflows
  • Ability to rerun failed jobs

We have evaluated Pentaho, but these features are available in their Enterprise edition which is expensive. We are currently evaluating Azkaban as it supports javaprocess job types. But Azkaban is primarily created for Hadoop jobs so it has more deep integration with Hadoop infrastructure then plain javaprocesses.

Appreciate some suggestions for opensource or very low cost solutions.


回答1:


It sounds like Apache Airflow (https://github.com/apache/incubator-airflow) should meet your needs and it now has a Dataflow operator (https://github.com/apache/incubator-airflow/blob/master/airflow/contrib/operators/dataflow_operator.py).




回答2:


To orchestrate the Google dataflow we can use Cloud composer which is managed workflow orchestration service built on Apache Airflow. It gives more flexibility, using this we can orchestrate most of the google services and workflows that cross between on-premises and the public cloud.



来源:https://stackoverflow.com/questions/39006399/workflow-orchestration-for-google-dataflow

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!