Why is it recommended against using a dynamic start_date in Airflow?

被刻印的时光 ゝ 提交于 2019-11-26 17:24:16

问题


I've read Airflow's FAQ about "What's the deal with start_date?", but it still isn't clear to me why it is recommended against using dynamic start_date.

To my understanding, a DAG's execution_date is determined by the minimum start_date between all of the DAG's tasks, and subsequent DAG Runs are ran at the latest execution_date + schedule_interval.

If I set my DAG's default_args start_date to be for, say, yesterday at 20:00:00, with a schedule_interval of 1 day, how would that break or confuse the scheduler, if at all? If I understand correctly, the scheduler would trigger the DAG with an execution_date of yesterday at 20:00:00, and the next DAG Run would be scheduled for today at 20:00:00.

Is there some concept that I'm missing?


回答1:


First run would be at start_date+schedule_interval. It doesn't run dag on start_date, it always runs on start_date+schedule_interval.

As they mentioned in document if you give start_date dynamic for e.g. datetime.now() and give some schedule_interval(1 hour), it will never execute that run as now() moves along with time and datetime.now()+ 1 hour is not possible




回答2:


The scheduler expects to see a constant start date and interval. If you change it the scheduler might not notice until it reloads the DagBag, and if the new start date doesn't line up with your old schedule it might break depends_on_past behavior.

If you don't need depends_on_past the simplest might be to stop using the scheduler, set the start date to some arbitrary old date, and externally trigger the DAG however you like using a crontab or similar.



来源:https://stackoverflow.com/questions/41134524/why-is-it-recommended-against-using-a-dynamic-start-date-in-airflow

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!