问题
Right now, nodes in my DAG proceeds to the next day's task before the rest of the nodes of that DAG finishes. Is there a way for it to wait for the rest of the DAG to finish before moving unto the next day's DAG cycle?
(I do have depends_on_past as true, but that does not work in this case)
My DAG looks like this:
O
l
V
O -> O -> O -> O -> O
Also, tree view pic of the dag]

回答1:
Might be a bit late for this answer, but I ran into the same issue and the way I resolved it is I added two extra tasks in each dag. "Previous" at the start and "Complete" at the end. Previous task is external task sensor which monitors previous job. Complete is just a dummy operator. Lets say it runs every 30 minutes so the dag would look like this:
dag = DAG(dag_id='TEST_DAG', default_args=default_args, schedule_interval=timedelta(minutes=30))
PREVIOUS = ExternalTaskSensor(
task_id='Previous_Run',
external_dag_id='TEST_DAG',
external_task_id='All_Tasks_Completed',
allowed_states=['success'],
execution_delta=timedelta(minutes=30),
dag=DAG
)
T1 = BashOperator(
task_id='TASK_01',
bash_command='echo "Hello World from Task 1"',
dag=dag
)
COMPLETE = DummyOperator(
task_id='All_Tasks_Completed',
dag=DAG
)
PREVIOUS >> T1 >> COMPLETE
So the next dag, even tho it will come into the queue, it will not let tasks run until PREVIOUS is completed.
回答2:
if you want to just run one instance at a time then try setting max_active_runs=1
回答3:
What ended up working for me is a combination of
- Adding task dependencies : wait_for_downstream=True, depends_on_past=True
- Adding max_active_runs:1 to while creating the dag. I did try to add max_active_runs as a default argument, but that did not work.
来源:https://stackoverflow.com/questions/41009228/is-it-possible-for-airflow-scheduler-to-first-finish-the-previous-days-cycle-be