Fusing operators together

后端 未结 1 1646
日久生厌
日久生厌 2020-12-19 08:16

I\'m still in the process of deploying Airflow and I\'ve already felt the need to merge operators together. The most common use-case would

相关标签:
1条回答
  • 2020-12-19 08:52

    I have combined various hooks to create a Single operator based on my needs. A simple example is I clubbed gcs delete, copy, list method and get_size methods in hook to create a single operator called GcsDataValidationOperator. A rule of thumb would be to have Idempotency i.e. if you run multiple times it should produce the same result.

    Should operators be composed at all or is it better to have discrete steps?

    The only pitfall is maintainability, sometimes when the hooks change in the master branch, you will need to update all your operator manually if there are any breaking changes.

    Any pitfalls, improvements in above approaches?

    You can use PythonOperator and use the in-built hooks with .execute method, but it would still mean a lot of details in the DAG file. Hence, I would still go for a new operator approach

    Any other ways to combine operators together?

    Hooks are just interfaces to external platforms and databases like Hive, GCS, etc and form building blocks for operators. This allows the creation of new operators. Also, this mean you can customize templated field, add slack notification on each granular step inside your new operator and have your own logging details.

    In taxonomy of Airflow, is the primary motive of Hooks same as above, or do they serve some other purposes too?

    FWIW: I am the PMC member and a contributor of the Airflow project.

    0 讨论(0)
提交回复
热议问题