airflow

How to access the response from Airflow SimpleHttpOperator GET request

别等时光非礼了梦想. 提交于 2019-12-06 00:43:30
问题 I'm learning Airflow and have a simple quesiton. Below is my DAG called dog_retriever import airflow from airflow import DAG from airflow.operators.http_operator import SimpleHttpOperator from airflow.operators.sensors import HttpSensor from datetime import datetime, timedelta import json default_args = { 'owner': 'Loftium', 'depends_on_past': False, 'start_date': datetime(2017, 10, 9), 'email': 'rachel@loftium.com', 'email_on_failure': False, 'email_on_retry': False, 'retries': 3, 'retry

Creating connection outside of Airflow GUI

瘦欲@ 提交于 2019-12-05 23:37:10
问题 I would like to create S3 connection without interacting Airflow GUI. Is it possible through airflow.cfg or command line? We are using AWS role and following connection parameter works for us: {"aws_account_id":"xxxx","role_arn":"yyyyy"} So, manually creating connection on GUI for S3 is working, now we want to automate this process and want to add it as part of the Airflow deployment process. Any work around? 回答1: You can use the airflow CLI. Unfortunately there is no support for editing

Airflow : DAG marked as “success” if one task fails, because of trigger rule ALL_DONE

ぐ巨炮叔叔 提交于 2019-12-05 22:59:25
I have the following DAG with 3 tasks : start --> special_task --> end The task in the middle can succeed or fail, but end must always be executed (imagine this is a task for cleanly closing resources). For that, I used the trigger rule ALL_DONE : end.trigger_rule = trigger_rule.TriggerRule.ALL_DONE Using that, end is properly executed if special_task fails. However, since end is the last task and succeeds, the DAG is always marked as SUCCESS . How can I configure my DAG so that if one of the tasks failed, the whole DAG is marked as FAILED ? Example to reproduce import datetime from airflow

ImportError : cannot import DAG airflow

假如想象 提交于 2019-12-05 22:16:47
I have simple code, I am trying to import DAG from airflow from airflow import DAG from airflow.operators import BashOperator,S3KeySensor from datetime import datetime, timedelta import psycopg2 from datetime import date, timedelta yesterday = date.today() - timedelta(1) yesterdayDate = yesterday.strftime('%Y-%m-%d') But, I am getting Import Error Traceback (most recent call last): File "airflow.py", line 9, in <module> from airflow import DAG File "/home/ubuntu/airflow/dags/airflow.py", line 9, in <module> from airflow import DAG ImportError: cannot import name DAG apache-airflow version

Airflow, mark a task success or skip it before dag run

喜夏-厌秋 提交于 2019-12-05 20:33:05
We have a huge DAG, with many small and fast tasks and a few big and time consuming tasks. We want to run just a part of the DAG, and the easiest way that we found is to not add the task that we don't want to run. The problem is that our DAG has many co-dependencies, so it became a real challenge to not broke the dag when we want to skip some tasks. Its there a way to add a status to the task by default? (for every run), something like: # get the skip list from a env variable task_list = models.Variable.get('list_of_tasks_to_skip') dag.skip(task_list) or for task in task_list: task.status =

How do I add a new dag to a running airflow service?

若如初见. 提交于 2019-12-05 18:27:55
I have an airflow service that is currently running as separate docker containers for the webserver and scheduler, both backed by a postgres database. I have the dags synced between the two instances and the dags load appropriately when the services start. However, if I add a new dag to the dag folder (on both containers) while the service is running, the dag gets loaded into the dagbag but show up in the web gui with missing metadata. I can run "airflow initdb" after each update but that doesn't feel right. Is there a better way for the scheduler and webserver to sync up with the database?

Airflow: dag_id could not be found

扶醉桌前 提交于 2019-12-05 18:04:15
问题 I'm running an airflow server and worker on different AWS machines. I've synced that dags folder between them, ran airflow initdb on both, and checked that the dag_id's are the same when I run airflow list_tasks <dag_id> When I run the scheduler and worker, I get this error on the worker: airflow.exceptions.AirflowException: dag_id could not be found: . Either the dag did not exist or it failed to parse. [...] Command ...--local -sd /home/ubuntu/airflow/dags/airflow_tutorial.py' What seems to

How to use airflow with Celery

拜拜、爱过 提交于 2019-12-05 17:43:10
问题 I'm new to airflow and celery, and I have finished drawing dag by now, but I want to run task in two computers which are in the same subnet, I want to know how to modify the airflow.cfg. Some examples could be better. Thanks to any answers orz. 回答1: The Airflow documentation covers this quite nicely: First, you will need a celery backend. This can be for example Redis or RabbitMQ. Then, the executor parameter in your airflow.cfg should be set to CeleryExecutor . Then, in the celery section of

Apache Airflow - customize logging format

ⅰ亾dé卋堺 提交于 2019-12-05 14:39:51
Is it possible to customize the format that Airflow uses for logging? I tried adding a LOG_FORMAT variable in $AIRFLOW_HOME/airflow.cfg, but it doesn't seem to take effect LOG_FORMAT = "%(asctime)s logLevel=%(levelname)s logger=%(name)s - %(message)s" Priyank Mehta You need to change the settings.py file in the airflow package to change the log format Update settings.py (after LOGGING_LEVEL add below line): LOG_FORMAT = os.path.expanduser(conf.get('core', 'LOG_FORMAT')) Update airflow.cfg configuration file: Add line under [core]: LOG_FORMAT = "%(asctime)s logLevel=%(levelname)s logger=%(name