Suggestion for scheduling tool(s) for building hadoop based data pipelines

青春壹個敷衍的年華 提交于 2019-12-22 17:54:56

问题


Between Apache Oozie, Spotify/Luigi and airbnb/airflow, what are the pros and cons for each of them?

I have used oozie and airflow in the past for building a data ingestion pipeline using PIG and Hive. Currently, I am in the process of building a pipeline that looks at logs and extracts out useful events and puts them on redshift.

I found that airflow was much easier to use/test/setup. It has a much cooler UI and lets users perform actions from the UI itself, which is not the case with Oozie. Any information about Luigi or other insights regarding stability and issues are welcome.


回答1:


  • Azkaban: Nice UI, relatively simple, accessible for non-programmers. Has a longish history at LinkedIn.
    • Check out the Azkaban CLI project for programmatic job creation. I have an Azkaban example workflows project on GitHub.
  • Airflow: Decent UI, Python-ish job definition, semi-accessible for non-programmers, dependency declaration syntax is weird.
  • Luigi: OK UI, workflows are pure Python, requires solid grasp of Python coding and object oriented concepts, hence not suitable for non-programmers.
  • Oozie: Insane XML based job definitions. Here be dragons. ;-)

IMHO, Azkaban enforces simplicity (can’t use features that don’t exist) and the others subtly encourage complexity.

Simpler pipelines are better than complex pipelines: Easier to create, easier to understand (especially when you didn’t create) and easier to debug/fix.

When complex actions are needed you want to encapsulate them in a way that either completely succeeds or completely fails.

If you can make it idempotent (running it again creates identical results) then that’s even better.




回答2:


This post will give you an initial idea about different possible workflows

http://bytepawn.com/luigi-airflow-pinball.html



来源:https://stackoverflow.com/questions/35733441/suggestion-for-scheduling-tools-for-building-hadoop-based-data-pipelines

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!