Execute Airflow DAG instances (tasks) on a list of specific dates

孤者浪人 提交于 2019-12-13 03:46:48

问题


I would like to manage a couple of future releases using Apache airflow. All of these releases are known way in advance and I need to make sure some data pushing won't be forgotten.

The problem is that those future release do not follow a simple periodic schedule that could be handled with a classic cron like 0 1 23 * * or something like @monthly.

It's rather 2019-08-24, 2019-09-30 , 2019-10-20 ...

Is there another way but to create a seperate mydag.py file for all of those future releases? What is the standard / recommended way to do this? Am I thinking about this the wrong way (I wonder because the documentation and tutorials rather focus on the regular, periodic thing)?


回答1:


I can think of two simple ways of doing this

  1. Create 3-4 top-level DAGs, each having specific start_date = 2019-08-24, 2019-09-30... and schedule_interval='@once'

  2. Create a single top-level DAG having schedule_interval=None (start_date can be anything). Then create a "triggering-dag", that employs TriggerDagRunOperator to conditionally trigger your actual workflow on specific dates

Clearly the method 2 above is better




回答2:


You could give your DAG a @daily schedule, then start it with a ShortCircuitOperator task that checks to see if the execution date matches a release date. If it is, you pass the check and the DAG runs. Otherwise, it skips the entire DAG and no release happens. See an example of this operator being used in https://github.com/apache/airflow/blob/1.10.3/airflow/example_dags/example_short_circuit_operator.py.

I imagine it'd look something like this:

RELEASE_DATES = ['2019-08-24', '2019-09-30', '2019-10-20']

dag = DAG(
    dag_id='my_dag',
    schedule_interval='@daily', 
    default_args=default_args,
)

def check_release_date(**context):
    # pass if it's a release day
    return context['ds'] in RELEASE_DATES

skip_if_not_release_date = ShortCircuitOperator(
    task_id='skip_if_not_release_date',
    python_callable=check_release_date,
    dag=dag,
    provide_context=True,
)

If release dates can change, then you might want to make this a little more dynamic with variables to make updates easy.

def check_release_date(**context):
    release_dates = Variable.get('release_dates', deserialize_json=True)
    return context['ds'] in RELEASE_DATES

Also if for whatever reason you need to override your hardcoded list of release dates, you can mark this task as success to force the DAG to run.



来源:https://stackoverflow.com/questions/57226707/execute-airflow-dag-instances-tasks-on-a-list-of-specific-dates

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!