How to manage python packages between airflow dags?

こ雲淡風輕ζ 提交于 2020-05-15 06:25:25

问题


If I have multiple airflow dags with some overlapping python package dependencies, how can I keep each of these project deps. decoupled? Eg. if I had project A and B on same server I would run each of them with something like...

source /path/to/virtualenv_a/activate
python script_a.py
deactivate
source /path/to/virtualenv_b/activate
python script_b.py
deactivate

Basically, would like to run dags with the same situation (eg. each dag uses python scripts that have may have overlapping package deps. that I would like to develop separately (ie. not have to update all code using a package when want to update the package just for one project)). Note, I've been using the BashOperator to run python tasks like...

do_stuff = BashOperator(
        task_id='my_task',
        bash_command='python /path/to/script.py'),
        execution_timeout=timedelta(minutes=30),
        dag=dag)

Is there a way to get this working? IS there some other best-practice way that airflow intendeds for people to address (or avoid) these kinds of problems?


回答1:


you can use packaged dags, where each dag is packaged with its dependency http://airflow.apache.org/concepts.html#packaged-dags




回答2:


Based on discussion from the apache-airflow mailing list, the simplest answer that addresses the modular way in which I am using various python scripts for tasks is to directly call virtualenv python interpreter binaries for each script or module, eg.

source /path/to/virtualenv_a/activate
python script_a.py
deactivate
source /path/to/virtualenv_b/activate
python script_b.py
deactivate

would translate to something like

do_stuff_a = BashOperator(
        task_id='my_task_a',
        bash_command='/path/to/virtualenv_a/bin/python /path/to/script_a.py'),
        execution_timeout=timedelta(minutes=30),
        dag=dag)
do_stuff_b = BashOperator(
        task_id='my_task_b',
        bash_command='/path/to/virtualenv_b/bin/python /path/to/script_b.py'),
        execution_timeout=timedelta(minutes=30),
        dag=dag)

in an airflow dag.



来源:https://stackoverflow.com/questions/58403832/how-to-manage-python-packages-between-airflow-dags

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!