CeleryExecutor in Airflow are not parallelizing tasks in a subdag

喜你入骨 提交于 2021-02-19 02:33:47

问题


We're using Airflow:1.10.0 and after some analysis why some of our ETL processes are taking so long we saw that the subdags are using a SequentialExecutor instead to use BaseExecutor or when we configure the CeleryExecutor.

I would like to know if this is a bug or an expected behavior of Airflow. Doesn't make any sense have some capability to execute tasks in parallel but in some specific kind of task, this capability is lost.


回答1:


It is a typical pattern to use the SequentialExecutor in subdags with the idea that you often are executing a lot of similar related tasks and don't necessarily want the added overhead of going through adding to queues in celery, etc. See the "other tips" section in the Airflow docs for subdags: https://airflow.apache.org/concepts.html#subdags

By default subdags are set to use the Sequential Executor (see: https://github.com/apache/incubator-airflow/blob/v1-10-stable/airflow/operators/subdag_operator.py#L38) but you can change that.

To use the celery executor, add in the following in your subdag creation:

from airflow.executors.celery_executor import CeleryExecutor
mysubdag = SubDagOperator(
    executor=CeleryExecutor()
    ...
)



回答2:


Maybe a little bit late but implementing LocalExecutor works for me.

from airflow.executors.local_executor import LocalExecutor

subdag = SubDagOperator(
  task_id=task_id,
  default_args=default_args,
  executor= LocalExecutor(),
  dag=dag
)



来源:https://stackoverflow.com/questions/51885410/celeryexecutor-in-airflow-are-not-parallelizing-tasks-in-a-subdag

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!