airflow

Make custom Airflow macros expand other macros

萝らか妹 提交于 2019-11-28 09:51:50
Is there any way to make a user-defined macro in Airflow which is itself computed from other macros? from airflow import DAG from airflow.operators.bash_operator import BashOperator dag = DAG( 'simple', schedule_interval='0 21 * * *', user_defined_macros={ 'next_execution_date': '{{ dag.following_schedule(execution_date) }}', }, ) task = BashOperator( task_id='bash_op', bash_command='echo "{{ next_execution_date }}"', dag=dag, ) The use case here is to back-port the new Airflow v1.8 next_execution_date macro to work in Airflow v1.7. Unfortunately, this template is rendered without macro

Airflow : ExternalTaskSensor doesn't trigger the task

你。 提交于 2019-11-28 08:32:15
问题 I have already seen this and this questions on SO and made the changes accordingly. However, my dependent DAG still gets stuck in poking state. Below is my master DAG: from airflow import DAG from airflow.operators.jdbc_operator import JdbcOperator from datetime import datetime from airflow.operators.bash_operator import BashOperator today = datetime.today() default_args = { 'depends_on_past': False, 'retries': 0, 'start_date': datetime(today.year, today.month, today.day), 'schedule_interval'

Airflow default on_failure_callback

廉价感情. 提交于 2019-11-28 08:30:36
In my DAG file, I have define a on_failure_callback() function to post a Slack in case of failure. It works well if I specify for each operator in my DAG : on_failure_callback=on_failure_callback() Is there a way to automate (via default_args for instance, or via my DAG object) the dispatch to all of my operators? I finally found a way to do that. You can pass your on_failure_callback as a default_args class Foo: @staticmethod def get_default_args(): """ Return default args :return: default_args """ default_args = { 'on_failure_callback': Foo.on_failure_callback } return default_args

Airflow error importing DAG using plugin - Relationships can only be set between Operators

生来就可爱ヽ(ⅴ<●) 提交于 2019-11-28 08:12:45
问题 I have written an airflow plugin that simply contains one custom operator (to support CMEK in BigQuery). I can create a simple DAG with a single task that uses this operator and that executes fine. However if I try and create a dependency in the DAG from a DummyOperator task to my custom operator task the DAG fails to load in the UI and throws the following error and I can't understand why this error is being thrown? Broken DAG: [/home/airflow/gcs/dags/js_bq_custom_plugin_v2.py] Relationships

Python Airflow - Return result from PythonOperator

烂漫一生 提交于 2019-11-28 07:52:20
I have written a DAG with multiple PythonOperators task1 = af_op.PythonOperator(task_id='Data_Extraction_Environment', provide_context=True, python_callable=Task1, dag=dag1) def Task1(**kwargs): return(kwargs['dag_run'].conf.get('file')) From PythonOperator i am calling "Task1" method. That method is returning a value,that value i need to pass to the next PythonOperator.How can i get the value from the "task1" variable or How can i get the value which is returned from Task1 method? updated : def Task1(**kwargs): file_name = kwargs['dag_run'].conf.get[file] task_instance = kwargs['task_instance

How to run Airflow on Windows

亡梦爱人 提交于 2019-11-28 07:24:24
The usual instructions for running Airflow do not apply on a Windows environment: # airflow needs a home, ~/airflow is the default, # but you can lay foundation somewhere else if you prefer # (optional) export AIRFLOW_HOME=~/airflow # install from pypi using pip pip install airflow # initialize the database airflow initdb # start the web server, default port is 8080 airflow webserver -p 8080 The Airflow utility is not available in the command line and I can't find it elsewhere to be manually added. How can Airflow run on Windows? You can activate bash in windows and follow the tutorial as is.

Airflow Logs BrokenPipeException

社会主义新天地 提交于 2019-11-28 07:17:35
问题 I'm using a clustered Airflow environment where I have four AWS ec2-instances for the servers. ec2-instances Server 1: Webserver, Scheduler, Redis Queue, PostgreSQL Database Server 2: Webserver Server 3: Worker Server 4: Worker My setup has been working perfectly fine for three months now but sporadically about once a week I get a Broken Pipe Exception when Airflow is attempting to log something. *** Log file isn't local. *** Fetching here: http://ip-1-2-3-4:8793/log/foobar/task_1/2018-07

execution_date in airflow: need to access as a variable

自闭症网瘾萝莉.ら 提交于 2019-11-28 05:15:52
I am really a newbie in this forum. But I have been playing with airflow, for sometime, for our company. Sorry if this question sounds really dumb. I am writing a pipeline using bunch of BashOperators. Basically, for each Task, I want to simply call a REST api using 'curl' This is what my pipeline looks like(very simplified version): from airflow import DAG from airflow.operators import BashOperator, PythonOperator from dateutil import tz import datetime datetime_obj = datetime.datetime default_args = { 'owner': 'airflow', 'depends_on_past': False, 'start_date': datetime.datetime.combine

airflow

爱⌒轻易说出口 提交于 2019-11-28 03:50:31
官网: http://airflow.apache.org/installation.html 原理: https://www.cnblogs.com/cord/p/9450910.html 安装: https://www.cnblogs.com/cord/p/9226608.html 高可用部署等: https://www.jianshu.com/p/2ecef979c606 使用方法等: https://www.jianshu.com/p/cbff05e3f125 airflow: https://airflow.apache.org/ [root@node1 ~]# ps -ef |grep 10740 root 4417 10740 2 03:28 ? 00:00:02 [ready] gunicorn: worker [airflow-webserver] root 4719 10740 3 03:29 ? 00:00:02 [ready] gunicorn: worker [airflow-webserver] root 5069 10740 6 03:29 ? 00:00:02 [ready] gunicorn: worker [airflow-webserver] root 7426 10740 47 03:30 ? 00:00:02 [ready]

With code, how do you update an airflow variable?

♀尐吖头ヾ 提交于 2019-11-28 03:46:38
问题 I need to update a variable I have made in Airflow programmatically but I can not find the answer on how to do that with code. I have retrieved my variable with this code: column_number = Variable.get('column_number') At the end of the function, I would like to increment the column_number by one I have tried this: Variable.set_val("column_number", int(column_number) + 1) And it does not work. Here is the full code for reference: import airflow from datetime import datetime, timedelta from