Airflow tasks get stuck at “queued” status and never gets running

匿名 (未验证) 提交于 2019-12-03 01:48:02

问题:

I'm using Airflow v1.8.1 and run all components (worker, web, flower, scheduler) on kubernetes & Docker. I use Celery Executor with Redis and my tasks are looks like:

(start) -> (do_work_for_product1)      ├  -> (do_work_for_product2)      ├  -> (do_work_for_product3)      ├  … 

So the start task has multiple downstreams. And I setup concurrency related configuration as below:

parallelism = 3 dag_concurrency = 3 max_active_runs = 1 

Then when I run this DAG manually (not sure if it never happens on a scheduled task) , some downstreams get executed, but others stuck at "queued" status.

If I clear the task from Admin UI, it gets executed. There is no worker log (after processing some first downstreams, it just doesn't output any log).

Web server's log (not sure worker exiting is related)

/usr/local/lib/python2.7/dist-packages/flask/exthook.py:71: ExtDeprecationWarning: Importing flask.ext.cache is deprecated, use flask_cache instead.   .format(x=modname), ExtDeprecationWarning [2017-08-24 04:20:56,496] [51] {models.py:168} INFO - Filling up the DagBag from /usr/local/airflow_dags [2017-08-24 04:20:57 +0000] [27] [INFO] Handling signal: ttou [2017-08-24 04:20:57 +0000] [37] [INFO] Worker exiting (pid: 37) 

There is no error log on scheduler, too. And a number of tasks get stuck is changing whenever I try this.

Because I also use Docker I'm wondering if this is related: https://github.com/puckel/docker-airflow/issues/94 But so far, no clue.

Has anyone faced with a similar issue or have some idea what I can investigate for this issue...?

回答1:

Tasks getting stuck is, most likely, a bug. At the moment (

This patch should resolve that issue.

It is worth investigating why your tasks do not get a RUNNING state. Setting itself to this state is first thing a task does. Normally the worker does log before it starts executing and it also reports and errors. You should be able to find entries of this in the task log.

edit: As was mentioned in the comments on the original question in case one example of airflow not being able to run a task is when it cannot write to required locations. This makes it unable to proceed and tasks would get stuck. The patch fixes this by failing the task from the scheduler.



回答2:

We have a solution and want to share it here before 1.9 becomes official. Thanks for Bolke de Bruin updates on 1.9. in my situation before 1.9, currently we are using 1.8.1 is to have another DAG running to clear the task in queue state if it stays there for over 30 mins.



标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!