apache-airflow | 易学教程

Airflow tasks get stuck at “queued” status and never gets running

阅读更多关于 Airflow tasks get stuck at “queued” status and never gets running

I'm using Airflow v1.8.1 and run all components (worker, web, flower, scheduler) on kubernetes & Docker. I use Celery Executor with Redis and my tasks are looks like: (start) -> (do_work_for_product1) ├ -> (do_work_for_product2) ├ -> (do_work_for_product3) ├ … So the start task has multiple downstreams. And I setup concurrency related configuration as below: parallelism = 3 dag_concurrency = 3 max_active_runs = 1 Then when I run this DAG manually (not sure if it never happens on a scheduled task) , some downstreams get executed, but others stuck at "queued" status. If I clear the task from

Make custom Airflow macros expand other macros

阅读更多关于 Make custom Airflow macros expand other macros

Is there any way to make a user-defined macro in Airflow which is itself computed from other macros? from airflow import DAG from airflow.operators.bash_operator import BashOperator dag = DAG( 'simple', schedule_interval='0 21 * * *', user_defined_macros={ 'next_execution_date': '{{ dag.following_schedule(execution_date) }}', }, ) task = BashOperator( task_id='bash_op', bash_command='echo "{{ next_execution_date }}"', dag=dag, ) The use case here is to back-port the new Airflow v1.8 next_execution_date macro to work in Airflow v1.7. Unfortunately, this template is rendered without macro

Airflow default on_failure_callback

阅读更多关于 Airflow default on_failure_callback

In my DAG file, I have define a on_failure_callback() function to post a Slack in case of failure. It works well if I specify for each operator in my DAG : on_failure_callback=on_failure_callback() Is there a way to automate (via default_args for instance, or via my DAG object) the dispatch to all of my operators? I finally found a way to do that. You can pass your on_failure_callback as a default_args class Foo: @staticmethod def get_default_args(): """ Return default args :return: default_args """ default_args = { 'on_failure_callback': Foo.on_failure_callback } return default_args

How to dynamically iterate over the output of an upstream task to create parallel tasks in airflow?

阅读更多关于 How to dynamically iterate over the output of an upstream task to create parallel tasks in airflow?

Consider the following example of a DAG where the first task, get_id_creds , extracts a list of credentials from a database. This operation tells me what users in my database I am able to run further data preprocessing on and it writes those ids to the file /tmp/ids.txt . I then scan those ids into my DAG and use them to generate a list of upload_transaction tasks that can be run in parallel. My question is: Is there a more idiomatically correct, dynamic way to do this using airflow? What I have here feels clumsy and brittle. How can I directly pass a list of valid IDs from one process to that

Airflow tasks get stuck at “queued” status and never gets running

阅读更多关于 Airflow tasks get stuck at “queued” status and never gets running

问题 I'm using Airflow v1.8.1 and run all components (worker, web, flower, scheduler) on kubernetes & Docker. I use Celery Executor with Redis and my tasks are looks like: (start) -> (do_work_for_product1) ├ -> (do_work_for_product2) ├ -> (do_work_for_product3) ├ … So the start task has multiple downstreams. And I setup concurrency related configuration as below: parallelism = 3 dag_concurrency = 3 max_active_runs = 1 Then when I run this DAG manually (not sure if it never happens on a scheduled

Airflow default on_failure_callback

阅读更多关于 Airflow default on_failure_callback

问题 In my DAG file, I have define a on_failure_callback() function to post a Slack in case of failure. It works well if I specify for each operator in my DAG : on_failure_callback=on_failure_callback() Is there a way to automate (via default_args for instance, or via my DAG object) the dispatch to all of my operators? 回答1: I finally found a way to do that. You can pass your on_failure_callback as a default_args class Foo: @staticmethod def get_default_args(): """ Return default args :return:

How to dynamically iterate over the output of an upstream task to create parallel tasks in airflow?

阅读更多关于 How to dynamically iterate over the output of an upstream task to create parallel tasks in airflow?

问题 Consider the following example of a DAG where the first task, get_id_creds , extracts a list of credentials from a database. This operation tells me what users in my database I am able to run further data preprocessing on and it writes those ids to the file /tmp/ids.txt . I then scan those ids into my DAG and use them to generate a list of upload_transaction tasks that can be run in parallel. My question is: Is there a more idiomatically correct, dynamic way to do this using airflow? What I