How to skip tasks on Airflow?

浪子不回头ぞ 提交于 2020-01-23 07:51:19

问题


I'm trying to understand whether Airflow supports skipping tasks in a DAG for ad-hoc executions?

Lets say my DAG graph look like this: task1 > task2 > task3 > task4

And I would like to start my DAG manually from task3, what is the best way of doing that?

I've read about ShortCircuitOperator, but I'm looking for more ad-hoc solution which can apply once the execution is triggered.

Thanks!


回答1:


You can incorporate the SkipMixin that the ShortCircuitOperator uses under the hood to skip downstream tasks.

from airflow.models import BaseOperator, SkipMixin
from airflow.utils.decorators import apply_defaults


class mySkippingOperator(BaseOperator, SkipMixin)

    @apply_defaults
    def __init__(self,
                 condition,
                 *args,
                 **kwargs):
        super().__init__(*args, **kwargs)
        self.condition = condition

    def execute(self, context):

        if self.condition:
           self.log.info('Proceeding with downstream tasks...')
           return

        self.log.info('Skipping downstream tasks...')

        downstream_tasks = context['task'].get_flat_relatives(upstream=False)

        self.log.debug("Downstream task_ids %s", downstream_tasks)

        if downstream_tasks:
            self.skip(context['dag_run'], context['ti'].execution_date, downstream_tasks)

        self.log.info("Done.")



回答2:


From the way Apache Airflow is built, you can write the logic/branches to determine which tasks to run.

BUT

You cannot start task execution from any task in between. The ordering is completely defined by dependency mangement(upstream/downstrem).

However, if you are using celery operator, you can ignore all dependencies in a run and ask airflow to execute the task as you please. Then again, this will not prevent the tasks upstream from being scheduled.




回答3:


Maayan, There is a very dirty but very simple and the most obvious solution. practically 30 seconds. But, it's only possible if you can easily update code in PROD and the ability to temporary prevent from others to run the DAG. Just commenting the tasks you want to skip

'#task1 > task2 >

task3 > task4

A more serious solution but with more effort will probably be to create the DAG dynamically based on a parameter of start_from_task and in this case the dependencies will be built using this parameter. The parameter can be changed in the UI using the Admin==>Variables menu. You can probably also use another variable of exportation time of the previous variable. e.g. - the DAG will ignore task1 and task2 until 14:05:30 and afterwards will run the whole DAG.




回答4:


Yes, you just click on task 3. Toggle the check boxes to the right of the run button to ignore dependencies, then click run.



来源:https://stackoverflow.com/questions/52190926/how-to-skip-tasks-on-airflow

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!