Apache Airflow - get all parent task_ids

痞子三分冷 提交于 2019-12-02 03:35:52

The upstream_task_ids and downstream_task_ids properties of BaseOperator are meant just for this purpose.

from typing import List
..
parent_task_ids: List[str] = my_task.upstream_task_ids
child_task_ids: List[str] = my_task_downstream_task_ids

Do note however that with this property, you only get immediate (upstream / downstream) neighbour(s) of a task. In order to get all ancestor or descendent tasks, you can quickly cook-up the good old graph theory approach such as this BFS-like implementation

from typing import List, Set
from queue import Queue
from airflow.models import BaseOperator

def get_ancestor_tasks(my_task: BaseOperator) -> List[BaseOperator]:
    ancestor_task_ids: Set[str] = set()
    tasks_queue: Queue = Queue()
    # determine parent tasks to begin BFS
    for task in my_task.upstream_list:
        tasks_queue.put(item=task)
    # perform BFS
    while not tasks_queue.empty():
        task: BaseOperator = tasks_queue.get()
        ancestor_task_ids.add(element=task.task_id)
        for _task in task.upstream_list:
            tasks_queue.put(item=_task)
    # Convert task_ids to actual tasks
    ancestor_tasks: List[BaseOperator] = [task for task in my_task.dag.tasks if task.task_id in ancestor_task_ids]
    return ancestor_tasks

Above snippet is NOT tested, but I'm sure you can take inspiration from it


References

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!