Airflow - creating dynamic Tasks from XCOM

后端 未结 2 1875
广开言路
广开言路 2020-12-17 02:25

I\'m attempting to generate a set of dynamic tasks from a XCOM variable. In the XCOM I\'m storing a list and I want to use each element of the list to dynamically create a

相关标签:
2条回答
  • 2020-12-17 03:09

    I wouldn't do what you're trying to achieve mainly because:

    1. XCOM value is a state generated in runtime
    2. DAG structure is something determined in parse time

    Even if you use something like the following to get an access to XCOM values generated by some upstream task:

    from airflow.models import TaskInstance
    from airflow.utils.db import provide_session
    
    dag = DAG(...)
    
    @provide_session
    def get_files_list(session):
        execution_date = dag.previous_schedule(datetime.now())
    
        // Find previous task instance:
        ti = session.query(TaskInstance).filter(
            TaskInstance.dag_id == dag.dag_id,
            TaskInstance.execution_date == execution_date,
            TaskInstance.task_id == upstream_task_id).first()
        if ti:
            files_list = ti.xcom_pull()
            if files_list:
                return files_list
        // Return default state:
        return {...}
    
    
    files_list = get_files_list()
    // Generate tasks based on upstream task state:
    task = PythonOperator(
        ...
        xcom_push=True,
        dag=dag)
    

    But this would behave very strangely, because DAG parsing and task execution are not synchronised in a way you wish.

    If the main reason you want to do this is parallelising files processing, I'd have some static number of processing tasks (determined by the required parallelism) that read files list from upstream task's XCOM value and operate on a relevant portion of that list.

    Another option is parallelising files processing using some framework for distributed computations like Apache Spark.

    0 讨论(0)
  • 2020-12-17 03:12

    The simplest way I can think of is to use a branch operator. https://github.com/apache/airflow/blob/master/airflow/example_dags/example_branch_operator.py

    0 讨论(0)
提交回复
热议问题