airflow | 易学教程

How to access the response from Airflow SimpleHttpOperator GET request

阅读更多关于 How to access the response from Airflow SimpleHttpOperator GET request

问题 I'm learning Airflow and have a simple quesiton. Below is my DAG called dog_retriever import airflow from airflow import DAG from airflow.operators.http_operator import SimpleHttpOperator from airflow.operators.sensors import HttpSensor from datetime import datetime, timedelta import json default_args = { 'owner': 'Loftium', 'depends_on_past': False, 'start_date': datetime(2017, 10, 9), 'email': 'rachel@loftium.com', 'email_on_failure': False, 'email_on_retry': False, 'retries': 3, 'retry

Creating connection outside of Airflow GUI

阅读更多关于 Creating connection outside of Airflow GUI

问题 I would like to create S3 connection without interacting Airflow GUI. Is it possible through airflow.cfg or command line? We are using AWS role and following connection parameter works for us: {"aws_account_id":"xxxx","role_arn":"yyyyy"} So, manually creating connection on GUI for S3 is working, now we want to automate this process and want to add it as part of the Airflow deployment process. Any work around? 回答1: You can use the airflow CLI. Unfortunately there is no support for editing

Airflow : DAG marked as “success” if one task fails, because of trigger rule ALL_DONE

阅读更多关于 Airflow : DAG marked as “success” if one task fails, because of trigger rule ALL_DONE

I have the following DAG with 3 tasks : start --> special_task --> end The task in the middle can succeed or fail, but end must always be executed (imagine this is a task for cleanly closing resources). For that, I used the trigger rule ALL_DONE : end.trigger_rule = trigger_rule.TriggerRule.ALL_DONE Using that, end is properly executed if special_task fails. However, since end is the last task and succeeds, the DAG is always marked as SUCCESS . How can I configure my DAG so that if one of the tasks failed, the whole DAG is marked as FAILED ? Example to reproduce import datetime from airflow

ImportError : cannot import DAG airflow

阅读更多关于 ImportError : cannot import DAG airflow

I have simple code, I am trying to import DAG from airflow from airflow import DAG from airflow.operators import BashOperator,S3KeySensor from datetime import datetime, timedelta import psycopg2 from datetime import date, timedelta yesterday = date.today() - timedelta(1) yesterdayDate = yesterday.strftime('%Y-%m-%d') But, I am getting Import Error Traceback (most recent call last): File "airflow.py", line 9, in <module> from airflow import DAG File "/home/ubuntu/airflow/dags/airflow.py", line 9, in <module> from airflow import DAG ImportError: cannot import name DAG apache-airflow version

Airflow, mark a task success or skip it before dag run

阅读更多关于 Airflow, mark a task success or skip it before dag run

We have a huge DAG, with many small and fast tasks and a few big and time consuming tasks. We want to run just a part of the DAG, and the easiest way that we found is to not add the task that we don't want to run. The problem is that our DAG has many co-dependencies, so it became a real challenge to not broke the dag when we want to skip some tasks. Its there a way to add a status to the task by default? (for every run), something like: # get the skip list from a env variable task_list = models.Variable.get('list_of_tasks_to_skip') dag.skip(task_list) or for task in task_list: task.status =

How do I add a new dag to a running airflow service?

阅读更多关于 How do I add a new dag to a running airflow service?

I have an airflow service that is currently running as separate docker containers for the webserver and scheduler, both backed by a postgres database. I have the dags synced between the two instances and the dags load appropriately when the services start. However, if I add a new dag to the dag folder (on both containers) while the service is running, the dag gets loaded into the dagbag but show up in the web gui with missing metadata. I can run "airflow initdb" after each update but that doesn't feel right. Is there a better way for the scheduler and webserver to sync up with the database?

Airflow: dag_id could not be found

阅读更多关于 Airflow: dag_id could not be found

问题 I'm running an airflow server and worker on different AWS machines. I've synced that dags folder between them, ran airflow initdb on both, and checked that the dag_id's are the same when I run airflow list_tasks <dag_id> When I run the scheduler and worker, I get this error on the worker: airflow.exceptions.AirflowException: dag_id could not be found: . Either the dag did not exist or it failed to parse. [...] Command ...--local -sd /home/ubuntu/airflow/dags/airflow_tutorial.py' What seems to

How to use airflow with Celery

阅读更多关于 How to use airflow with Celery

问题 I'm new to airflow and celery, and I have finished drawing dag by now, but I want to run task in two computers which are in the same subnet, I want to know how to modify the airflow.cfg. Some examples could be better. Thanks to any answers orz. 回答1: The Airflow documentation covers this quite nicely: First, you will need a celery backend. This can be for example Redis or RabbitMQ. Then, the executor parameter in your airflow.cfg should be set to CeleryExecutor . Then, in the celery section of

airflow--调度研究

阅读更多关于 airflow--调度研究

来源： https://www.cnblogs.com/wqbin/p/11936991.html

Apache Airflow - customize logging format

阅读更多关于 Apache Airflow - customize logging format

Is it possible to customize the format that Airflow uses for logging? I tried adding a LOG_FORMAT variable in $AIRFLOW_HOME/airflow.cfg, but it doesn't seem to take effect LOG_FORMAT = "%(asctime)s logLevel=%(levelname)s logger=%(name)s - %(message)s" Priyank Mehta You need to change the settings.py file in the airflow package to change the log format Update settings.py (after LOGGING_LEVEL add below line): LOG_FORMAT = os.path.expanduser(conf.get('core', 'LOG_FORMAT')) Update airflow.cfg configuration file: Add line under [core]: LOG_FORMAT = "%(asctime)s logLevel=%(levelname)s logger=%(name