Is it possible for Airflow scheduler to first finish the previous day's cycle before starting the next?

こ雲淡風輕ζ 提交于 2019-12-03 01:02:50
Oleg Yamin

Might be a bit late for this answer, but I ran into the same issue and the way I resolved it is I added two extra tasks in each dag. "Previous" at the start and "Complete" at the end. Previous task is external task sensor which monitors previous job. Complete is just a dummy operator. Lets say it runs every 30 minutes so the dag would look like this:

dag = DAG(dag_id='TEST_DAG', default_args=default_args, schedule_interval=timedelta(minutes=30))

PREVIOUS = ExternalTaskSensor(
    task_id='Previous_Run',
    external_dag_id='TEST_DAG',
    external_task_id='All_Tasks_Completed',
    allowed_states=['success'],
    execution_delta=timedelta(minutes=30),
    dag=DAG
)

T1 = BashOperator(
    task_id='TASK_01',
    bash_command='echo "Hello World from Task 1"',
    dag=dag
)

COMPLETE = DummyOperator(
    task_id='All_Tasks_Completed',
    dag=DAG
)

PREVIOUS >> T1 >> COMPLETE

So the next dag, even tho it will come into the queue, it will not let tasks run until PREVIOUS is completed.

if you want to just run one instance at a time then try setting max_active_runs=1

What ended up working for me is a combination of

  1. Adding task dependencies : wait_for_downstream=True, depends_on_past=True
  2. Adding max_active_runs:1 to while creating the dag. I did try to add max_active_runs as a default argument, but that did not work.
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!