Airflow - creating dynamic Tasks from XCOM

别说谁变了你拦得住时间么 提交于 2019-12-04 10:29:27

I wouldn't do what you're trying to achieve mainly because:

  1. XCOM value is a state generated in runtime
  2. DAG structure is something determined in parse time

Even if you use something like the following to get an access to XCOM values generated by some upstream task:

from airflow.models import TaskInstance
from airflow.utils.db import provide_session

dag = DAG(...)

@provide_session
def get_files_list(session):
    execution_date = dag.previous_schedule(datetime.now())

    // Find previous task instance:
    ti = session.query(TaskInstance).filter(
        TaskInstance.dag_id == dag.dag_id,
        TaskInstance.execution_date == execution_date,
        TaskInstance.task_id == upstream_task_id).first()
    if ti:
        files_list = ti.xcom_pull()
        if files_list:
            return files_list
    // Return default state:
    return {...}


files_list = get_files_list()
// Generate tasks based on upstream task state:
task = PythonOperator(
    ...
    xcom_push=True,
    dag=dag)

But this would behave very strangely, because DAG parsing and task execution are not synchronised in a way you wish.

If the main reason you want to do this is parallelising files processing, I'd have some static number of processing tasks (determined by the required parallelism) that read files list from upstream task's XCOM value and operate on a relevant portion of that list.

Another option is parallelising files processing using some framework for distributed computations like Apache Spark.

The simplest way I can think of is to use a branch operator. https://github.com/apache/airflow/blob/master/airflow/example_dags/example_branch_operator.py

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!