Fusing operators together

我只是一个虾纸丫 提交于 2019-11-29 11:38:25

I have combined various hooks to create a Single operator based on my needs. A simple example is I clubbed gcs delete, copy, list method and get_size methods in hook to create a single operator called GcsDataValidationOperator. A rule of thumb would be to have Idempotency i.e. if you run multiple times it should produce the same result.

Should operators be composed at all or is it better to have discrete steps?

The only pitfall is maintainability, sometimes when the hooks change in the master branch, you will need to update all your operator manually if there are any breaking changes.

Any pitfalls, improvements in above approaches?

You can use PythonOperator and use the in-built hooks with .execute method, but it would still mean a lot of details in the DAG file. Hence, I would still go for a new operator approach

Any other ways to combine operators together?

Hooks are just interfaces to external platforms and databases like Hive, GCS, etc and form building blocks for operators. This allows the creation of new operators. Also, this mean you can customize templated field, add slack notification on each granular step inside your new operator and have your own logging details.

In taxonomy of Airflow, is the primary motive of Hooks same as above, or do they serve some other purposes too?

FWIW: I am the PMC member and a contributor of the Airflow project.

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!