可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效,请关闭广告屏蔽插件后再试):
问题:
Airflow is randomly not running queued tasks some tasks dont even get queued status. I keep seeing below in the scheduler logs
[2018-02-28 02:24:58,780] {jobs.py:1077} INFO - No tasks to consider for execution.
I do see tasks in database that either have no status or queued status but they never get started.
The airflow setup is running https://github.com/puckel/docker-airflow on ECS with Redis. There are 4 scheduler threads and 4 Celery worker tasks. For the tasks that are not running are showing in queued state (grey icon) when hovering over the task icon operator is null and task details says:
All dependencies are met but the task instance is not running. In most cases this just means that the task will probably be scheduled soon unless:- The scheduler is down or under heavy load
Metrics on scheduler do not show heavy load. The dag is very simple with 2 independent tasks only dependent on last run. There are also tasks in the same dag that are stuck with no status (white icon).
Interesting thing to notice is when I restart the scheduler tasks change to running state.
回答1:
I'm running a (modified) fork of the same repo as well, predominantly on Airflow 1.8 for about a year with 10M+ task instances. I think the issue persists in 1.9, but I'm not completely sure.
For whatever reason, there seems to be a long-standing bug in Airflow that the scheduler performance degrades over time. I've looked into the scheduler code but I'm still a bit unclear on exactly what happens differently on a fresh start that makes the difference to kick it back into scheduling normally again. (One major difference is that scheduled and queued task states are rebuilt.)
The doc Scheduler Basics in the Airflow wiki provides a nice concise reference on how the scheduler works and its various states.
Most people solve this problem by restarting the scheduler regularly. I've found success at a 1-hour interval personally, though your number of tasks, task duration, and parallelism settings are worth consideration for the restart interval.
For more info see:
This used to be addressed by restarting every X runs using the SCHEDULER_RUNS
config setting, although that setting was recently removed from the default systemd scripts.
You might also consider posting to the Airflow dev mailing list. I know this has been discussed there a few times and one of the core contributors may be able to provide additional context. (If they do, I'd be happy to update this answer to reflect that.)
Related Questions
回答2:
Airflow can be a bit tricky to setup.
- Do you have the
airflow scheduler
running? - Do you have the
airflow webserver
running? - Have you checked that all DAGs you want to run are set to On in the web ui?
- Do all the DAGs you want to run have a start date which is in the past?
- Do all the DAGs you want to run have a proper schedule which is shown in the web ui?
- If nothing else works, you can use the web ui to click on the dag, then on Graph View. Now select the first task and click on Task Instance. In the paragraph Task Instance Details you will see why a DAG is waiting or not running.
I've had for instance a DAG which was wrongly set to depends_on_past: True
which forbid the current instance to start correctly.
Also a great resource directly in the docs, which has a few more hints: Why isn't my task getting scheduled?.