apache-airflow

Airflow tasks get stuck at “queued” status and never gets running

假装没事ソ 提交于 2019-11-29 03:08:51
I'm using Airflow v1.8.1 and run all components (worker, web, flower, scheduler) on kubernetes & Docker. I use Celery Executor with Redis and my tasks are looks like: (start) -> (do_work_for_product1) ├ -> (do_work_for_product2) ├ -> (do_work_for_product3) ├ … So the start task has multiple downstreams. And I setup concurrency related configuration as below: parallelism = 3 dag_concurrency = 3 max_active_runs = 1 Then when I run this DAG manually (not sure if it never happens on a scheduled task) , some downstreams get executed, but others stuck at "queued" status. If I clear the task from

Make custom Airflow macros expand other macros

萝らか妹 提交于 2019-11-28 09:51:50
Is there any way to make a user-defined macro in Airflow which is itself computed from other macros? from airflow import DAG from airflow.operators.bash_operator import BashOperator dag = DAG( 'simple', schedule_interval='0 21 * * *', user_defined_macros={ 'next_execution_date': '{{ dag.following_schedule(execution_date) }}', }, ) task = BashOperator( task_id='bash_op', bash_command='echo "{{ next_execution_date }}"', dag=dag, ) The use case here is to back-port the new Airflow v1.8 next_execution_date macro to work in Airflow v1.7. Unfortunately, this template is rendered without macro

Airflow default on_failure_callback

廉价感情. 提交于 2019-11-28 08:30:36
In my DAG file, I have define a on_failure_callback() function to post a Slack in case of failure. It works well if I specify for each operator in my DAG : on_failure_callback=on_failure_callback() Is there a way to automate (via default_args for instance, or via my DAG object) the dispatch to all of my operators? I finally found a way to do that. You can pass your on_failure_callback as a default_args class Foo: @staticmethod def get_default_args(): """ Return default args :return: default_args """ default_args = { 'on_failure_callback': Foo.on_failure_callback } return default_args

How to dynamically iterate over the output of an upstream task to create parallel tasks in airflow?

浪子不回头ぞ 提交于 2019-11-27 21:41:28
Consider the following example of a DAG where the first task, get_id_creds , extracts a list of credentials from a database. This operation tells me what users in my database I am able to run further data preprocessing on and it writes those ids to the file /tmp/ids.txt . I then scan those ids into my DAG and use them to generate a list of upload_transaction tasks that can be run in parallel. My question is: Is there a more idiomatically correct, dynamic way to do this using airflow? What I have here feels clumsy and brittle. How can I directly pass a list of valid IDs from one process to that

Airflow tasks get stuck at “queued” status and never gets running

六眼飞鱼酱① 提交于 2019-11-27 17:19:38
问题 I'm using Airflow v1.8.1 and run all components (worker, web, flower, scheduler) on kubernetes & Docker. I use Celery Executor with Redis and my tasks are looks like: (start) -> (do_work_for_product1) ├ -> (do_work_for_product2) ├ -> (do_work_for_product3) ├ … So the start task has multiple downstreams. And I setup concurrency related configuration as below: parallelism = 3 dag_concurrency = 3 max_active_runs = 1 Then when I run this DAG manually (not sure if it never happens on a scheduled

Airflow default on_failure_callback

╄→尐↘猪︶ㄣ 提交于 2019-11-26 23:03:02
问题 In my DAG file, I have define a on_failure_callback() function to post a Slack in case of failure. It works well if I specify for each operator in my DAG : on_failure_callback=on_failure_callback() Is there a way to automate (via default_args for instance, or via my DAG object) the dispatch to all of my operators? 回答1: I finally found a way to do that. You can pass your on_failure_callback as a default_args class Foo: @staticmethod def get_default_args(): """ Return default args :return:

How to dynamically iterate over the output of an upstream task to create parallel tasks in airflow?

人盡茶涼 提交于 2019-11-26 20:45:51
问题 Consider the following example of a DAG where the first task, get_id_creds , extracts a list of credentials from a database. This operation tells me what users in my database I am able to run further data preprocessing on and it writes those ids to the file /tmp/ids.txt . I then scan those ids into my DAG and use them to generate a list of upload_transaction tasks that can be run in parallel. My question is: Is there a more idiomatically correct, dynamic way to do this using airflow? What I