Airflow issue with branching tasks

不问归期 提交于 2021-02-19 07:34:42

问题


I am trying to setup a DAG where a task is run every minute, and then another task is run on the 5th minute (right before the 1 minute task). It's really just testing, I am not planning to run jobs in such short intervals.

Visually, my DAG looks like this:

And the code itself like this:

from airflow import DAG
from airflow.operators.bash_operator import BashOperator
from airflow.operators.python_operator import BranchPythonOperator
from datetime import datetime, timedelta

default_args = {
    'owner': 'airflow',
    'depends_on_past': False,
    'start_date': datetime(2018, 10, 9)
}

now = datetime.now()
minute_check = now.minute % 5

dag = DAG(
    dag_id='test3',
    default_args=default_args,
    schedule_interval='* * * * *',
    dagrun_timeout=timedelta(minutes=5),
    catchup=False,
    max_active_runs=99
        )

def check_minute():
    if minute_check == 0:
        return "branch_fiveminute"
    else:
        return "branch_minute"

branch_task = BranchPythonOperator(
    task_id='branch_task',
    python_callable=check_minute,
    trigger_rule='all_done',
    dag=dag)

branch_minute = BashOperator(
    task_id='branch_minute',
    bash_command='test1min.sh ',
    trigger_rule='all_done',
    dag=dag)

branch_fiveminute = BashOperator(
    task_id='branch_fiveminute',
    bash_command='test5min.sh ',
    trigger_rule='all_done',
    dag=dag)

branch_task.set_downstream(branch_minute)
branch_task.set_downstream(branch_fiveminute)
branch_fiveminute.set_downstream(branch_minute)

The problem i am getting is, that on the 5th minute, airflow skips the 1 minute task:

I have tried playing around with the trigger_rule settings without much success.

Any ideas whats wrong? I am using Airflow 1.10 if it matters.


回答1:


Since you follow a different execution path for the 5 minute task, the one minute task gets skipped. It's a little counter intuitive from the diagram but only 1 path with execute.

So what you have to do is is have the branch at the beginning, one path leads into a dummy operator for false and one path leads to the 5 minute task, however both the 5 minute task and the dummy operator will lead into the 1 minute task.

This way the dummy task gets skipped but the execution flow ends up in the 1 minute task regardless of which execution path is selected.

from airflow import DAG
from airflow.operators.python_operator import BranchPythonOperator
from airflow.operators.dummy_operator  import DummyOperator
from airflow.operators.bash_operator   import BashOperator
from datetime import datetime, timedelta

default_args = {
    'owner': 'airflow',
    'depends_on_past': False,
    'start_date': datetime(2018, 10, 9)
}

now = datetime.now()
minute_check = now.minute % 5

dag = DAG(
    dag_id='test3',
    default_args=default_args,
    schedule_interval='* * * * *',
    dagrun_timeout=timedelta(minutes=5),
    catchup=False,
    max_active_runs=99
        )

def check_minute():
    if minute_check == 0:
        return "branch_fiveminute"
    else:
        return "branch_false_1"

branch_task = BranchPythonOperator(
    task_id='branch_task',
    python_callable=check_minute,
    trigger_rule='all_done',
    dag=dag)

branch_minute = BashOperator(
    task_id='branch_minute',
    bash_command='test1min.sh ',
    trigger_rule='all_done',
    dag=dag)

branch_fiveminute = BashOperator(
    task_id='branch_fiveminute',
    bash_command='test5min.sh ',
    trigger_rule='all_done',
    dag=dag)

branch_false_1 = DummyOperator( task_id= "branch_false_1", dag=dag )

branch_task.set_downstream(branch_false_1)
branch_task.set_downstream(branch_fiveminute)
branch_fiveminute.set_downstream(branch_minute)
branch_false_1.set_downstream(branch_minute)


来源:https://stackoverflow.com/questions/52868189/airflow-issue-with-branching-tasks

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!