Airflow 1.9.0 is queuing but not launching tasks

匿名 (未验证) 提交于 2019-12-03 01:58:03

问题:

Airflow is randomly not running queued tasks some tasks dont even get queued status. I keep seeing below in the scheduler logs

 [2018-02-28 02:24:58,780] {jobs.py:1077} INFO - No tasks to consider for execution. 

I do see tasks in database that either have no status or queued status but they never get started.

The airflow setup is running https://github.com/puckel/docker-airflow on ECS with Redis. There are 4 scheduler threads and 4 Celery worker tasks. For the tasks that are not running are showing in queued state (grey icon) when hovering over the task icon operator is null and task details says:

    All dependencies are met but the task instance is not running. In most cases this just means that the task will probably be scheduled soon unless:- The scheduler is down or under heavy load 

Metrics on scheduler do not show heavy load. The dag is very simple with 2 independent tasks only dependent on last run. There are also tasks in the same dag that are stuck with no status (white icon).

Interesting thing to notice is when I restart the scheduler tasks change to running state.

回答1:

I'm running a (modified) fork of the same repo as well, predominantly on Airflow 1.8 for about a year with 10M+ task instances. I think the issue persists in 1.9, but I'm not completely sure.

For whatever reason, there seems to be a long-standing bug in Airflow that the scheduler performance degrades over time. I've looked into the scheduler code but I'm still a bit unclear on exactly what happens differently on a fresh start that makes the difference to kick it back into scheduling normally again. (One major difference is that scheduled and queued task states are rebuilt.)

The doc Scheduler Basics in the Airflow wiki provides a nice concise reference on how the scheduler works and its various states.

Most people solve this problem by restarting the scheduler regularly. I've found success at a 1-hour interval personally, though your number of tasks, task duration, and parallelism settings are worth consideration for the restart interval.

For more info see:

This used to be addressed by restarting every X runs using the SCHEDULER_RUNS config setting, although that setting was recently removed from the default systemd scripts.

You might also consider posting to the Airflow dev mailing list. I know this has been discussed there a few times and one of the core contributors may be able to provide additional context. (If they do, I'd be happy to update this answer to reflect that.)

Related Questions



回答2:

Airflow can be a bit tricky to setup.

  • Do you have the airflow scheduler running?
  • Do you have the airflow webserver running?
  • Have you checked that all DAGs you want to run are set to On in the web ui?
  • Do all the DAGs you want to run have a start date which is in the past?
  • Do all the DAGs you want to run have a proper schedule which is shown in the web ui?
  • If nothing else works, you can use the web ui to click on the dag, then on Graph View. Now select the first task and click on Task Instance. In the paragraph Task Instance Details you will see why a DAG is waiting or not running.

I've had for instance a DAG which was wrongly set to depends_on_past: True which forbid the current instance to start correctly.

Also a great resource directly in the docs, which has a few more hints: Why isn't my task getting scheduled?.



标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!