Airflow celery worker will be blocked if sensor number large than concurrency?

可紊 提交于 2019-12-01 16:25:05

In ETL best practices with Airflow's Gotchas section the author addresses this general problem. One of the suggestions is to setup a pool for your sensor tasks so that your other tasks don't get starved. For your situation determine the number of sensor tasks that you want running at one time (less than your concurrency level) and setup a pool with that as a limit. Once your pool is setup pass the pool argument to each of your sensor operators. For more on pools see Airflow's documentation on concepts. Here is an example of passing a pool argument to an operator:

aggregate_db_message_job = BashOperator( 
    task_id='aggregate_db_message_job', 
    execution_timeout=timedelta(hours=3), 
    pool='ep_data_pipeline_db_msg_agg',
    bash_command=aggregate_db_message_job_cmd, dag=dag)
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!