Want to create airflow tasks that are downstream of the current task

馋奶兔 提交于 2019-12-08 17:31:31

Regarding your proposed solution, I don't think you can use XComs to achieve this, as they are only available to instances and not when you define the DAG (to the best of my knowledge).

You can however use a SubDAG to achieve your objective. The SubDagOperator gets a function which is going to be invoked when the operator is going to be executed and that generates a DAG, giving you a chance to dynamically create a sub-section of your workflow.

You can test the idea using this simple example, which generates a random of tasks every time it's invoked:

import airflow
from builtins import range
from random import randint
from airflow.operators.bash_operator import BashOperator
from airflow.operators.subdag_operator import SubDagOperator
from airflow.models import DAG

args = {
    'owner': 'airflow',
    'start_date': airflow.utils.dates.days_ago(2)
}

dag = DAG(dag_id='dynamic_dag', default_args=args)

def generate_subdag(parent_dag, dag_id, default_args):
    # pseudo-randomly determine a number of tasks to be created
    n_tasks = randint(1, 10)

    subdag = DAG(
        '%s.%s' % (parent_dag.dag_id, dag_id),
        schedule_interval=parent_dag.schedule_interval,
        start_date=parent_dag.start_date,
        default_args=default_args
    )
    for i in range(n_tasks):
        i = str(i)
        task = BashOperator(task_id='echo_%s' % i, bash_command='echo %s' % i, dag=subdag)

    return subdag

subdag_dag_id = 'dynamic_subdag'

SubDagOperator(
    subdag=generate_subdag(dag, subdag_dag_id, args),
    task_id=subdag_dag_id,
    dag=dag
)

If you execute this you'll notice that in different runs SubDAGs are likely to contain a different number of tasks (I tested this with version 1.8.0). You can access the SubDAG view on the WebUI by accessing the graph view, clicking on the grey SubDAG node and then on "Zoom into SubDAG".

You can use this concept by listing files and creating one task for each of those instead of just generating them in a random number like in the example. The tasks themselves can be arranged in parallel (as I did), sequentially or in any valid directed acyclic layout.

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!