Running more than 32 concurrent tasks in Apache Airflow

牧云@^-^@ 提交于 2019-12-22 13:48:17

问题


I'm running Apache Airflow 1.8.1. I would like to run more than 32 concurrent tasks on my instance, but cannot get any of the configurations to work.

I am using the CeleryExecutor, the Airflow config in the UI shows 64 for parallelism and dag_concurrency and I've restarted the Airflow scheduler, web server and workers numerous times (I'm actually testing this locally in a Vagrant machine, but have also tested in on an EC2 instance).

airflow.cfg

# The amount of parallelism as a setting to the executor. This defines
# the max number of task instances that should run simultaneously
# on this airflow installation
parallelism = 64

# The number of task instances allowed to run concurrently by the scheduler
dag_concurrency = 64

Example DAG. I've tried both without and with the concurrency argument directly in the DAG.

from datetime import datetime

from airflow import DAG
from airflow.operators.bash_operator import BashOperator

dag = DAG(
    'concurrency_dev',
    default_args={
        'owner': 'airflow',
        'depends_on_past': False,
        'start_date': datetime(2018, 1, 1),
    },
    schedule_interval=None,
    catchup=False
)

for i in range(0, 40):
    BashOperator(
        task_id='concurrency_dev_{i}'.format(i=i),
        bash_command='sleep 60',
        dag=dag
    )

Regardless, only 32 tasks are ever executed simultaneously.


回答1:


If you have 2 workers and celeryd_concurrency = 16 then you're limited to 32 tasks. If non_pooled_task_slot_count = 32 you'd also be limited. Of course parallelism and dag_concurrency need to be set above 32 on not only the webservers and schedulers, but the workers too.



来源:https://stackoverflow.com/questions/53640246/running-more-than-32-concurrent-tasks-in-apache-airflow

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!