可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效,请关闭广告屏蔽插件后再试):
问题:
I am new to airflow and trying to setup airflow to run ETL pipelines. I was able to install
- airflow
- postgres
- celery
- rabbitmq
I am able to test run the turtorial dag. When i try to schedule the jobs, scheduler is able to pick it up and queue the jobs which i could see on the UI but tasks are not running. Could somebody help me fix ths issue? I believe i am missing most basic airflow concept here. below is the airflow.cfg
Here is my config file:
[core] airflow_home = /root/airflow dags_folder = /root/airflow/dags base_log_folder = /root/airflow/logs executor = CeleryExecutor sql_alchemy_conn = postgresql+psycopg2://xxxx.amazonaws.com:5432/airflow api_client = airflow.api.client.local_client [webserver] web_server_host = 0.0.0.0 web_server_port = 8080 web_server_worker_timeout = 120 worker_refresh_batch_size = 1 worker_refresh_interval = 30 [celery] celery_app_name = airflow.executors.celery_executor celeryd_concurrency = 16 worker_log_server_port = 8793 broker_url = amqp://rabbit:rabbit@x.x.x.x/rabbitmq_vhost celery_result_backend = db+postgresql+psycopg2://postgres:airflow@xxx.amazonaws.com:5432/airflow flower_host = 0.0.0.0 flower_port = 5555 default_queue = default
DAG: This is the tutorial dag i used
and the start date for my dag is -- 'start_date': datetime(2017, 4, 11),
回答1:
have your run all the three components of airflow, namely:
airflow webserver airflow scheduler airflow worker
If you only run the previous two, the tasks will be queued, but not executed. airflow worker will provide the workers that actually execute the dags.
Also btw, celery 4.0.2 is not compatible with airflow 1.7 or 1.8 currently. Use celery 3 instead.
回答2:
I tried to upgrade to airflow v1.8 today as well and struggled with celery and rabbitmq. What helped was the change from librabbitmq (which is used by default when just using amqp) to pyamqp in airflow.cfg
broker_url = pyamqp://rabbit:rabbit@x.x.x.x/rabbitmq_vhost
(This is where i got the idea from: https://github.com/celery/celery/issues/3675)
回答3:
I realise your problem is already answered and was related to a celery version mismatch, but I've also seen tasks queue and never run because I changed the logs location to a place where the airflow service user did not have permission to write.
In the example airflow.cfg given in the question above: base_log_folder = /root/airflow/logs
I am using AWS EC2 machine and changed the logs to write to base_log_folder = /mnt/airflow/logs
In the UI there is no indication given as to why tasks are queued, it just says "unknown, all dependencies are met ..." Giving the airflow daemon/service user permission to write fixed it.