Right now, nodes in my DAG proceeds to the next day's task before the rest of the nodes of that DAG finishes. Is there a way for it to wait for the rest of the DAG to finish before moving unto the next day's DAG cycle?
(I do have depends_on_past as true, but that does not work in this case)
My DAG looks like this:
O
l
V
O -> O -> O -> O -> O
Also, tree view pic of the dag]

Might be a bit late for this answer, but I ran into the same issue and the way I resolved it is I added two extra tasks in each dag. "Previous" at the start and "Complete" at the end. Previous task is external task sensor which monitors previous job. Complete is just a dummy operator. Lets say it runs every 30 minutes so the dag would look like this:
dag = DAG(dag_id='TEST_DAG', default_args=default_args, schedule_interval=timedelta(minutes=30))
PREVIOUS = ExternalTaskSensor(
task_id='Previous_Run',
external_dag_id='TEST_DAG',
external_task_id='All_Tasks_Completed',
allowed_states=['success'],
execution_delta=timedelta(minutes=30),
dag=DAG
)
T1 = BashOperator(
task_id='TASK_01',
bash_command='echo "Hello World from Task 1"',
dag=dag
)
COMPLETE = DummyOperator(
task_id='All_Tasks_Completed',
dag=DAG
)
PREVIOUS >> T1 >> COMPLETE
So the next dag, even tho it will come into the queue, it will not let tasks run until PREVIOUS is completed.
if you want to just run one instance at a time then try setting max_active_runs=1
What ended up working for me is a combination of
- Adding task dependencies : wait_for_downstream=True, depends_on_past=True
- Adding max_active_runs:1 to while creating the dag. I did try to add max_active_runs as a default argument, but that did not work.
来源:https://stackoverflow.com/questions/41009228/is-it-possible-for-airflow-scheduler-to-first-finish-the-previous-days-cycle-be