Airflow setup for high availability

℡╲_俬逩灬. 提交于 2019-11-30 03:06:45

问题


How to deploy apache airflow (formally known as airbnb's airflow) scheduler in high availability?

I am not asking about the backend DB or RabbitMQ that should obviously be deployed in high availability configuration.

My main focus is the scheduler - is there something special needs to be done?


回答1:


After a bit digging I found that it is not safe to run multiple schedulers simoultanously, this means that out of the box - airflow schedulers are not safe to use in high availablity environments.

The airflow team are planning to solve this issue by adding a lock mechanism on the DAG data structure, but this is not implemented yet (I checked by running 2 schedulers and saw that they schedule the same dag instances which is not good). This is described here: https://groups.google.com/forum/#!topic/airbnb_airflow/-1wKa3OcwME

I did found a way to workaround this high availalbilty issue by wrapping the schedulers with my own code and use cluster tools for leader election (I personanlly use consul for this purpose). This way only the elected master is running the scheduler and when the master is down the slave replaces him.

Please consider this when u use airflow in high availabilty environments since out of the box, airflow scheduler is currently not suitable for this (unless you solve this issue yourself).

Edit - an alternative approach to the master slave solution is to use a cluster manager/scheduler to make sure that only one airflow scheduler instance is always available. This approach relies on the self healing abilities of the cluster manager u have. For example both mesos and nomad supports this kind of configuration (I presonally chose nomad for its simplicity).




回答2:


My personal experience was to follow the instructions I found for some best practices; that is to restart the scheduler every 10 runs ( -N 10 ) and use this software when possible:

https://github.com/teamclairvoyant/airflow-scheduler-failover-controller

I also use a DAG which pings a monitoring system to be sure that the scheduler has not gone away.



来源:https://stackoverflow.com/questions/39572079/airflow-setup-for-high-availability

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!