问题
If I have multiple airflow dags with some overlapping python package dependencies, how can I keep each of these project deps. decoupled? Eg. if I had project A and B on same server I would run each of them with something like...
source /path/to/virtualenv_a/activate
python script_a.py
deactivate
source /path/to/virtualenv_b/activate
python script_b.py
deactivate
Basically, would like to run dags with the same situation (eg. each dag uses python scripts that have may have overlapping package deps. that I would like to develop separately (ie. not have to update all code using a package when want to update the package just for one project)). Note, I've been using the BashOperator
to run python tasks like...
do_stuff = BashOperator(
task_id='my_task',
bash_command='python /path/to/script.py'),
execution_timeout=timedelta(minutes=30),
dag=dag)
Is there a way to get this working? IS there some other best-practice way that airflow intendeds for people to address (or avoid) these kinds of problems?
回答1:
you can use packaged dags, where each dag is packaged with its dependency http://airflow.apache.org/concepts.html#packaged-dags
回答2:
Based on discussion from the apache-airflow mailing list, the simplest answer that addresses the modular way in which I am using various python scripts for tasks is to directly call virtualenv python interpreter binaries for each script or module, eg.
source /path/to/virtualenv_a/activate
python script_a.py
deactivate
source /path/to/virtualenv_b/activate
python script_b.py
deactivate
would translate to something like
do_stuff_a = BashOperator(
task_id='my_task_a',
bash_command='/path/to/virtualenv_a/bin/python /path/to/script_a.py'),
execution_timeout=timedelta(minutes=30),
dag=dag)
do_stuff_b = BashOperator(
task_id='my_task_b',
bash_command='/path/to/virtualenv_b/bin/python /path/to/script_b.py'),
execution_timeout=timedelta(minutes=30),
dag=dag)
in an airflow dag.
来源:https://stackoverflow.com/questions/58403832/how-to-manage-python-packages-between-airflow-dags