Run parallel tasks in Apache Airflow

自作多情 提交于 2019-12-09 16:38:03

问题


I am able to configure airflow.cfg file to run tasks one after the other.

What I want to do is, execute tasks in parallel, e.g. 2 at a time and reach the end of list.

How can I configure this?


回答1:


Executing tasks in Airflow in parallel depends on which executor you're using, e.g., SequentialExecutor, LocalExecutor, CeleryExecutor, etc.

For a simple setup, you can achieve parallelism by just setting your executor to LocalExecutor in your airflow.cfg:

[core]
executor = LocalExecutor

Reference: https://github.com/apache/incubator-airflow/blob/29ae02a070132543ac92706d74d9a5dc676053d9/airflow/config_templates/default_airflow.cfg#L76

This will spin up a separate process for each task.

(Of course you'll need to have a DAG with at least 2 tasks that can execute in parallel to see it work.)

Alternatively, with CeleryExecutor, you can spin up any number of workers by just running (as many times as you want):

$ airflow worker

The tasks will go into a Celery queue and each Celery worker will pull off of the queue.

You might find the section Scaling out with Celery in the Airflow Configuration docs helpful.

https://airflow.apache.org/howto/executor/use-celery.html

For any executor, you may want to tweak the core settings that control parallelism once you have that running.

They're all found under [core]. These are the defaults:

# The amount of parallelism as a setting to the executor. This defines
# the max number of task instances that should run simultaneously
# on this airflow installation
parallelism = 32

# The number of task instances allowed to run concurrently by the scheduler
dag_concurrency = 16

# Are DAGs paused by default at creation
dags_are_paused_at_creation = True

# When not using pools, tasks are run in the "default pool",
# whose size is guided by this config element
non_pooled_task_slot_count = 128

# The maximum number of active DAG runs per DAG
max_active_runs_per_dag = 16

Reference: https://github.com/apache/incubator-airflow/blob/29ae02a070132543ac92706d74d9a5dc676053d9/airflow/config_templates/default_airflow.cfg#L99



来源:https://stackoverflow.com/questions/50184012/run-parallel-tasks-in-apache-airflow

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!