Unexpected Airflow behaviour in dynamic task generation

爱⌒轻易说出口 提交于 2021-02-11 04:29:16

问题


For reasons acceptable to me, I am trying to dynamically generate ExternalTaskSensor tasks with different execution_date_fn in each iteration. Callable provided to execution_date_fn kwarg requires to have dt as input and provide execution_date as output, which I am writing down as a lambda function, e.g. lambda dt: get_execution_date(i).

I noticed that execution_date_fn provided as a lambda function in a loop results in unexpected behaviour - all generated tasks have the same execution_date

I noticed that this behaviour is not intrinsic to ExternalTaskSensor but originates somewhere else. This behaviour can be seen in this example:

from airflow import DAG
from airflow.operators.python_operator import PythonOperator
from datetime import datetime

dag = DAG(
    'test_lambda',
    schedule_interval=None,
    start_date=datetime(2021,1,1),
    catchup=False
)

for task_id in ['task1', 'task2']:
    task = PythonOperator(
        task_id='printer_'+task_id,
        python_callable=lambda: print(task_id),
        dag=dag
    )

This results in both tasks printer_task1 and printer_task2 printing 'task2' in logs.

I have managed to correct the behaviour by moving sensor instantiation into a function:

from airflow import DAG
from airflow.operators.python_operator import PythonOperator
from datetime import datetime

def create_task(task_id):
    task = PythonOperator(
        task_id='printer_'+task_id,
        python_callable=lambda: print(task_id),
        dag=dag
    )
    return task

dag = DAG(
    'test_lambda',
    schedule_interval=None,
    start_date=datetime(2021,1,1),
    catchup=False
)

for task_id in ['task1', 'task2']:
    task = create_task(task_id)

In this case task printer_task1 prints 'task1' and printer_task2 prints 'task2' in the logs.

I would be interested to know why am I observing such behaviour?

DISCLAIMER: I am aware that normal way to provide arguments to a PythonOperator is via op_args kwarg. Lambda functions were used solely to provide an example as op_args option is not available in ExternalTaskSensor when using execution_date_fn.

EDIT: This is a lambda issue and not Airflow-specific. Official Python documentation has a topic on the issue: https://docs.python.org/3/faq/programming.html#why-do-lambdas-defined-in-a-loop-with-different-values-all-return-the-same-result


回答1:


This has little to do with Airflow, it is a lambda issue:

>>> ls = [lambda: i for i in [1,2]]
>>> ls[0]()
2
>>> ls[1]()
2

To know why it does that, I recommend reading that Stackoverflow post that will probably explains why better than I could



来源:https://stackoverflow.com/questions/65664442/unexpected-airflow-behaviour-in-dynamic-task-generation

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!