Test Dag run for Airflow 1.9 in unittest

大城市里の小女人 提交于 2019-12-06 09:05:48

I'm not familar with Airflow 1.7, but I guess it didn't have the same "DagBag" concept that Airflow1.8 and upwards have.

You can't run a DAG that you have created like this, because dag.run() starts a new python process and it will have to find the DAG from a dag folder it parses on disk - but it can't. This was included as a message in the output (but you didn't include the full error message/output)

What are you trying to test by creating a dag in the test files? Is it a custom operator? Then you would be better off testing that directly. For instance, here is how I test a custom operator stand-alone:

class MyPluginTest(unittest.TestCase)
    def setUp(self):
        dag = DAG(TEST_DAG_ID, schedule_interval='* * * * Thu', default_args={'start_date': DEFAULT_DATE})
        self.dag = dag
        self.op = myplugin.FindTriggerFileForExecutionPeriod(
            dag=dag,
            task_id='test',
            prefix='s3://bucket/some/prefix',
        )
        self.ti = TaskInstance(task=self.op, execution_date=DEFAULT_DATE)

        # Other S3 setup here, specific to my test


    def test_execute_no_trigger(self):
        with self.assertRaises(RuntimeError):
            self.ti.run(ignore_ti_state=True)

        # It shouldn't have anything in XCom
        self.assertEqual(
            self.ti.xcom_pull(task_ids=self.op.task_id),
            None
        )

Here's a function you can use in a pytest test case that will run the tasks of your DAG in order.

from datetime import timedelta
import pytest
from unittest import TestCase


@pytest.fixture
def test_dag(dag):
    dag._schedule_interval = timedelta(days=1)  # override cuz @once gets skipped
    done = set([])

    def run(key):
        task = dag.task_dict[key]
        for k in task._upstream_task_ids:
            run(k)
        if key not in done:
            print(f'running task {key}...')
            date = dag.default_args['start_date']
            task.run(date, date, ignore_ti_state=True)
            done.add(key)
    for k, _ in dag.task_dict.items():
        run(k)

You can then use test_dag(dag) instead of dag.run() in your test.

You'll need to make sure your logging in your custom operators uses self.log.info() rather than logging.info() or print(), or they won't show up.

You may also need to run your test using python -m pytest -s test_my_dag.py, as without the -s flag Airflow's stdout will not be captured.

I'm still trying to figure out how to handle inter-DAG dependencies.

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!