airflow

Airflow unpause dag programmatically?

我是研究僧i 提交于 2019-11-30 03:00:56
问题 I have a dag that we'll deploy to multiple different airflow instances and in our airflow.cfg we have dags_are_paused_at_creation = True but for this specific dag we want it to be turned on without having to do so manually by clicking on the UI. Is there a way to do it programmatically? 回答1: airflow-rest-api-plugin plugin can also be used to programmatically pause tasks. Pauses a DAG Available in Airflow Version: 1.7.0 or greater GET - http://{HOST}:{PORT}/admin/rest_api/api?api=pause Query

How to remove default example dags in airflow

狂风中的少年 提交于 2019-11-30 01:17:18
I am a new user of Airbnb's open source workflow/datapipeline software airflow . There are dozens of default example dags after the web UI is started. I tried many ways to remove these dags, but I've failed to do so. load_examples = False is set in airflow.cfg. Folder lib/python2.7/site-packages/airflow/example_dags is removed. States of those example dags are changed to gray after I removed the dags folder, but the items still occupy the web UI screen. And a new dag folder is specified in airflow.cfg as dags_folder = /mnt/dag/1 . I checked this dag folder, nothing is there. It's really weird

How to restart a failed task on Airflow

流过昼夜 提交于 2019-11-30 00:03:44
I am using a LocalExecutor and my dag has 3 tasks where task(C) is dependant on task(A). Task(B) and task(A) can run in parallel something like below A-->C B So task(A) has failed and but task(B) ran fine . Task(C) is yet to run as task(A) has failed. My question is how do i re run Task(A) alone so Task(C) runs once Task(A) completes and Airflow UI marks them as success. In the UI: Go to the dag, and dag run of the run you want to change Click on GraphView Click on task A Click "Clear" This will let task A run again, and if it succeeds, task C should run. This works because when you clear a

Airflow failed slack message

牧云@^-^@ 提交于 2019-11-29 23:38:32
问题 How can I configure Airflow so that any failure in the DAG will (immediately) result in a slack message? At this moment I manage it by creating a slack_failed_task: slack_failed_task = SlackAPIPostOperator( task_id='slack_failed', channel="#datalabs", trigger_rule='one_failed', token="...", text = ':red_circle: DAG Failed', icon_url = 'http://airbnb.io/img/projects/airflow3.png', dag=dag) And set this task (one_failed) upstream from each other task in the DAG: slack_failed_task << download

Airflow - How to pass xcom variable into Python function

China☆狼群 提交于 2019-11-29 23:03:17
I need to reference a variable that's returned by a BashOperator . I may be doing this wrong so please forgive me. In my task_archive_s3_file , I need to get the filename from get_s3_file . The task simply prints {{ ti.xcom_pull(task_ids=submit_file_to_spark) }} as a string instead of the value. If I use the bash_command , the value prints correctly. get_s3_file = PythonOperator( task_id='get_s3_file', python_callable=obj.func_get_s3_file, trigger_rule=TriggerRule.ALL_SUCCESS, dag=dag) submit_file_to_spark = BashOperator( task_id='submit_file_to_spark', bash_command="echo 'hello world'",

Assigning tasks to specific machines with airflow

不打扰是莪最后的温柔 提交于 2019-11-29 18:52:45
问题 I'm new to Airflow. I have a DAG which contains a task that should run on a specific machine (EMR cluster in my case). How can I tell airflow where to run specific tasks so that every time it will run it will do so on that machine only? 回答1: Run your worker on that machine with a queue name. In the airflow cli you could do something like: airflow worker -q my_queue Then define that task to use that queue: task = PythonOperator( task_id='task', python_callable=my_callable, queue='my_queue',

How to pull xcom value from other task instance in the same DAG run (not the most recent one)?

ぃ、小莉子 提交于 2019-11-29 17:37:38
I have 3 DAG runs: DAGR 1 executed at 2019-02-13 16:00:00 DAGR 2 executed at 2019-02-13 17:00:00 DAGR 3 executed at 2019-02-13 18:00:00 In a task instance X of DAGR 1 I want to get xcom value of task instance Y . I did this: kwargs['task_instance'].xcom_pull(task_ids='Y') I expected to get value of xcom from task instance Y in DAGR 1 . Instead I got from DAGR 3 . From Airflow documentation If xcom_pull is passed a single string for task_ids , then the most recent XCom value from that task is returned; ... Why Airflow xcom_pull return the most recent xcom value? What if I want to pull from the

Airflow 1.10 Installation Failing

半世苍凉 提交于 2019-11-29 16:39:53
I have a working Airflow environment using Airflow version 1.9 that is running on an Amazon EC2-Instance. I need to upgrade to the latest version of Airflow which is 1.10. I have the option of either upgrading from version 1.9 or installing 1.10 freshly on a new server. Airflow version 1.10 is not listed on Pip so I'm installing it from Git via this command, pip-3.6 install git+git://github.com/apache/incubator-airflow.git@v1-10-stable This command successfully installs Airflow version 1.10. You can see that by running the command airflow version and viewing the output, ____________ __________

Airflow 安装教程

ⅰ亾dé卋堺 提交于 2019-11-29 15:49:34
说明: Airflow 的安装需要依赖 Python3.0 及以上版本,Python3.0的安装教程见:( https://blog.csdn.net/CZ_yjsy_data/article/details/100776239 ) 在线安装步骤: 安装最新稳定版本的Airflow最简单的方法是使用pip: 一:airflow needs a home, ~/airflow is the default ,but you can lay foundation somewhere else if you prefer export AIRFLOW_HOME=~/airflow 二:install from pypi using pip pip3 install apache-airflow 三:initialize the database airflow initdb 四:start the web server, default port is 8080 airflow webserver -p 8080 五:start the scheduler airflow scheduler visit localhost:8080 in the browser and enable the example dag in the home page pip3 install apache

Airflow HiveCliHook connection to remote hive cluster?

拟墨画扇 提交于 2019-11-29 15:38:13
I am trying to connect to my hive server from a local copy of Airflow, but it seems like the HiveCliHook is trying to connect to my local copy of Hive. I'm running to following to test it: import airflow from airflow.models import Connection from airflow.hooks.hive_hooks import HiveCliHook usr = 'myusername' pss = 'mypass' session = airflow.settings.Session() hive_cli = session.query(Connection).filter(Connection.conn_id == 'hive_cli_default').all()[0] hive_cli.host = 'hive_server.test.mydomain.com' hive_cli.port = '9083' hive_cli.login = usr hive_cli.password = pss hive_cli.schema = 'default'