airflow

How control DAG concurrency in airflow

房东的猫 提交于 2019-12-10 15:45:25
问题 I use airflow v1.7.1.3 I have two DAG, dag_a and dag_b. I set up 10 dag_a tasks at one time, which theoretically should be execution one by one. In reality, the 10 dag_a tasks are executed in parallel. The concurrency parameter doesn't work. Can anyone tell me why? Here's the pseudocode: in dag_a.py dag = DAG('dag_a', start_date=datetime.now(), default_args=default_args, schedule_interval=None, concurrency=1, max_active_runs=1) in dag_b.py from fabric.api import local dag = DAG('dag_b', start

Airflow - Unknown Blue Task Status

血红的双手。 提交于 2019-12-10 15:14:28
问题 I just got a task colored in Blue which doesn't appear in the status legend. I'm curious if this is a bug or an undocumented status. As you can see the color blue doesn't show up in the list of potential statuses on the right. I had just finished clearing all past, future, and upstream attempts fyi. 回答1: That's a known TaskInstance State; it's just not on the UI -- it stands for shutdown : https://github.com/apache/incubator-airflow/blob/master/airflow/utils/state.py#L70 Other statuses that

Difference between “airflow run” and “airflow test” in Airflow

笑着哭i 提交于 2019-12-10 13:42:07
问题 In Airflow, I have been using "airflow run" and "airflow test" but don't understand fully how they are different. What are their differences? Thanks! 回答1: Reading through the docs myself, I see how it can be confusing. Airflow Run will run a task instance as if you had triggered it directly through the UI. Perhaps most importantly the state will be recorded in the database and that state will be reflected in the UI as if the task had run under automatic circumstances Airflow Test will skip

Airflow DAG success callback

你离开我真会死。 提交于 2019-12-10 13:38:50
问题 Is there an elegant way to define callback for DAG succeed event? I really don't want to set a task which will be upstream of all other tasks with on_sucess_callback. Thanks! 回答1: So if I understand correctly, the last step of your DAG is, in case of success, to call back to some other system. So I would encourage you to model your DAG exactly that way. Why would you try to hide that part from the logic of your DAG? That's exactly what the up/downstream modeling is for. Hiding part of the DAG

How to pass dynamic arguments Airflow operator?

主宰稳场 提交于 2019-12-10 10:23:44
问题 I am using Airflow to run Spark jobs on Google Cloud Composer. I need to Create cluster (YAML parameters supplied by user) list of spark jobs (job params also supplied by per job YAML) With the Airflow API - I can read YAML files, and push variables across tasks using xcom. But, consider the DataprocClusterCreateOperator() cluster_name project_id zone and a few other arguments are marked as templated. What if I want to pass in other arguments as templated (which are currently not so)? - like

Airflow, mark a task success or skip it before dag run

风格不统一 提交于 2019-12-10 10:15:32
问题 We have a huge DAG, with many small and fast tasks and a few big and time consuming tasks. We want to run just a part of the DAG, and the easiest way that we found is to not add the task that we don't want to run. The problem is that our DAG has many co-dependencies, so it became a real challenge to not broke the dag when we want to skip some tasks. Its there a way to add a status to the task by default? (for every run), something like: # get the skip list from a env variable task_list =

Airflow depends_on_past explanation

时光怂恿深爱的人放手 提交于 2019-12-10 03:28:56
问题 According to the official Airflow docs, The task instances directly upstream from the task need to be in a success state. Also, if you have set depends_on_past=True, the previous task instance needs to have succeeded (except if it is the first run for that task). As all know, the task is kind of 'instantiated & parameteriazed' operator. Now this is what confuse me. For example: DAG: {op_1} -> {op_2} -> {op_3} {op_2} is a simple PythonOperator that takes 1 parameter from {op_1} and do stuff;

“Handling signal: ttou” message while running DAG in airflow

我的梦境 提交于 2019-12-10 03:00:57
问题 I have created sample DAG, where I had DAG config as below. default_args = { 'owner': 'airflow', 'depends_on_past': False, 'start_date': one_min_ago, 'email': ['admin@airflow.com'], 'email_on_failure': True, 'email_on_retry': True, 'retries': 5, 'retry_delay': timedelta(hours=30)) With this when I run airflow webserver I'm getting below message. /home/af_user/anaconda/lib/python3.5/site-packages/flask/exthook.py:71: ExtDeprecationWarning: Importing flask.ext.cache is deprecated, use flask

Airflow installation successfull, but unable to run it

故事扮演 提交于 2019-12-10 02:01:22
问题 C:\Python27\Scripts>airflow initdb 'airflow' is not recognized as an internal or external command, operable program or batch file. C:\Python27\Scripts>airflow init 'airflow' is not recognized as an internal or external command, operable program or batch file. C:\Python27\Scripts>airflow webserver -p 8080 'airflow' is not recognized as an internal or external command, operable program or batch file. I am trying to install in Windows 7 machine and I am using Python 2.7 回答1: Airflow doesn't

Run parallel tasks in Apache Airflow

自作多情 提交于 2019-12-09 16:38:03
问题 I am able to configure airflow.cfg file to run tasks one after the other. What I want to do is, execute tasks in parallel, e.g. 2 at a time and reach the end of list. How can I configure this? 回答1: Executing tasks in Airflow in parallel depends on which executor you're using, e.g., SequentialExecutor , LocalExecutor , CeleryExecutor , etc. For a simple setup, you can achieve parallelism by just setting your executor to LocalExecutor in your airflow.cfg: [core] executor = LocalExecutor