问题
I'm running Apache Airflow 1.8.1. I would like to run more than 32 concurrent tasks on my instance, but cannot get any of the configurations to work.
I am using the CeleryExecutor, the Airflow config in the UI shows 64 for parallelism
and dag_concurrency
and I've restarted the Airflow scheduler, web server and workers numerous times (I'm actually testing this locally in a Vagrant machine, but have also tested in on an EC2 instance).
airflow.cfg
# The amount of parallelism as a setting to the executor. This defines
# the max number of task instances that should run simultaneously
# on this airflow installation
parallelism = 64
# The number of task instances allowed to run concurrently by the scheduler
dag_concurrency = 64
Example DAG. I've tried both without and with the concurrency
argument directly in the DAG.
from datetime import datetime
from airflow import DAG
from airflow.operators.bash_operator import BashOperator
dag = DAG(
'concurrency_dev',
default_args={
'owner': 'airflow',
'depends_on_past': False,
'start_date': datetime(2018, 1, 1),
},
schedule_interval=None,
catchup=False
)
for i in range(0, 40):
BashOperator(
task_id='concurrency_dev_{i}'.format(i=i),
bash_command='sleep 60',
dag=dag
)
Regardless, only 32 tasks are ever executed simultaneously.
回答1:
If you have 2 workers and celeryd_concurrency = 16
then you're limited to 32 tasks. If non_pooled_task_slot_count = 32
you'd also be limited.
Of course parallelism
and dag_concurrency
need to be set above 32 on not only the webservers and schedulers, but the workers too.
来源:https://stackoverflow.com/questions/53640246/running-more-than-32-concurrent-tasks-in-apache-airflow