问题
I use airflow v1.7.1.3
I have two DAG, dag_a and dag_b. I set up 10 dag_a tasks at one time, which theoretically should be execution one by one. In reality, the 10 dag_a tasks are executed in parallel. The concurrency parameter doesn't work. Can anyone tell me why?
Here's the pseudocode:
in dag_a.py
dag = DAG('dag_a',
start_date=datetime.now(),
default_args=default_args,
schedule_interval=None,
concurrency=1,
max_active_runs=1)
in dag_b.py
from fabric.api import local
dag = DAG('dag_b',
start_date=datetime.now(),
default_args=default_args,
schedule_interval='0 22 */1 * *',
concurrency=1,
max_active_runs=1)
def trigger_dag_a(**context):
dag_list = []
for rec in rang(1,10):
time.sleep(2)
cmd = "airflow trigger_dag dag_a"
log.info("cmd:%s"%cmd)
msg = local(cmd) #"local" is function in fabric
log.info(msg)
trigger_dag_a_proc = PythonOperator(python_callable=trigger_dag_a,
provide_context=True,
task_id='trigger_dag_a_proc',
dag=dag)
回答1:
You can limit your task instances by specifying a pool.
- Create a pool in the UI:
2.Then setup your dags to use this pool:
default_args = {
'email_on_failure': False,
'email_on_retry': False,
'start_date': datetime(2017, 12, 16),
'pool': 'my_pool'
}
dag = DAG(
dag_id='foo',
schedule_interval='@daily',
default_args=default_args,
)
回答2:
AFAIK, external dag triggers do not respect the concurrency/max_active_runs parameters of the DAGs. This also applies to backfills.
Only dag runs scheduled by the scheduler respect these parameters.
来源:https://stackoverflow.com/questions/45752912/how-control-dag-concurrency-in-airflow