apache-airflow

How to access the response from Airflow SimpleHttpOperator GET request

别等时光非礼了梦想. 提交于 2019-12-06 00:43:30
问题 I'm learning Airflow and have a simple quesiton. Below is my DAG called dog_retriever import airflow from airflow import DAG from airflow.operators.http_operator import SimpleHttpOperator from airflow.operators.sensors import HttpSensor from datetime import datetime, timedelta import json default_args = { 'owner': 'Loftium', 'depends_on_past': False, 'start_date': datetime(2017, 10, 9), 'email': 'rachel@loftium.com', 'email_on_failure': False, 'email_on_retry': False, 'retries': 3, 'retry

Creating connection outside of Airflow GUI

瘦欲@ 提交于 2019-12-05 23:37:10
问题 I would like to create S3 connection without interacting Airflow GUI. Is it possible through airflow.cfg or command line? We are using AWS role and following connection parameter works for us: {"aws_account_id":"xxxx","role_arn":"yyyyy"} So, manually creating connection on GUI for S3 is working, now we want to automate this process and want to add it as part of the Airflow deployment process. Any work around? 回答1: You can use the airflow CLI. Unfortunately there is no support for editing

Running an Airflow DAG every X minutes

我的未来我决定 提交于 2019-12-05 08:13:19
I am using airflow on an EC2 instance using the LocalScheduler option. I've invoked airflow scheduler and airflow webserver and everything seems to be running fine. That said, after supplying the cron string to schedule_interval for "do this every 10 minutes," '*/10 * * * *' , the job continue to execute every 24 hours by default. Here's the header of the code: from datetime import datetime import os import sys from airflow.models import DAG from airflow.operators.python_operator import PythonOperator import ds_dependencies SCRIPT_PATH = os.getenv('PREPROC_PATH') if SCRIPT_PATH: sys.path

Airflow will keep showing example dags even after removing it from configuration

♀尐吖头ヾ 提交于 2019-12-05 05:02:44
Airflow example dags remain in the UI even after I have turned off load_examples = False in config file. The system informs the dags are not present in the dag folder but they remain in UI because the scheduler has marked it as active in the metadata database. I know one way to remove them from there would be to directly delete these rows in the database but off course this is not ideal.How should I proceed to remove these dags from UI? There is currently no way of stopping a deleted DAG from being displayed on the UI except manually deleting the corresponding rows in the DB. The only other

How to get the JobID for the airflow dag runs?

梦想的初衷 提交于 2019-12-05 04:57:18
When we do a dagrun, on the Airflow UI, in the "Graph View" we get details of each job run. JobID is something like "scheduled__2017-04-11T10:47:00" . I need this JobID for tracking and log creation in which I maintain time each task/dagrun took. So my question is how can i get the JobID within the same dag that is being run . Thanks,Chetan This value is actually called run_id and can be accessed via the context or macros. In the python operator this is accessed via context, and in the bash operator this is accessed via jinja templating on the bash_command field. More info on what's available

How to wait for an asynchronous event in a task of a DAG in a workflow implemented using Airflow?

岁酱吖の 提交于 2019-12-05 02:47:11
My workflow implemented using Airflow contains tasks A, B, C, and D. I want the workflow to wait at task C for an event. In Airflow sensors are used to check for some condition by polling for some state, if that condition is true then the next task in the workflow gets triggered. My requirement is to avoid polling. Here one answer mentions about a rest_api_plugin of airflow which creates rest_api endpoint to trigger airflow CLI - using this plugin I can trigger a task in the workflow. In my workflow, however, I want to implement a task that waits for a rest API call(async event) without

Apache Airflow unable to establish connect to remote host via FTP/SFTP

混江龙づ霸主 提交于 2019-12-04 17:33:43
I am new to Apache Airflow and so far, I have been able to work my way through problems I have encountered. I have hit a wall now. I need to transfer files to a remote server via sftp. I have not had any luck doing this. So far, I have gotten S3 and Postgres/Redshift connections via their respective hooks to work in various DAGs. I have been able to use the FTPHook with success testing on my local FTP server, but have not been able to figure out how to use SFTP to connect to a remote host. I am able to connect to the remote host via SFTP with FileZilla, so I know my credentials are correct.

AssertionError: INTERNAL: No default project is specified

佐手、 提交于 2019-12-04 10:17:08
New to airflow. Trying to run the sql and store the result in a BigQuery table. Getting following error. Not sure where to setup the default_rpoject_id. Please help me. Error: Traceback (most recent call last): File "/usr/local/bin/airflow", line 28, in <module> args.func(args) File "/usr/local/lib/python2.7/dist-packages/airflow/bin/cli.py", line 585, in test ti.run(ignore_task_deps=True, ignore_ti_state=True, test_mode=True) File "/usr/local/lib/python2.7/dist-packages/airflow/utils/db.py", line 53, in wrapper result = func(*args, **kwargs) File "/usr/local/lib/python2.7/dist-packages

Creating connection outside of Airflow GUI

纵饮孤独 提交于 2019-12-04 06:22:06
I would like to create S3 connection without interacting Airflow GUI. Is it possible through airflow.cfg or command line? We are using AWS role and following connection parameter works for us: {"aws_account_id":"xxxx","role_arn":"yyyyy"} So, manually creating connection on GUI for S3 is working, now we want to automate this process and want to add it as part of the Airflow deployment process. Any work around? You can use the airflow CLI. Unfortunately there is no support for editing connections, so you would have to remove and add as part of your deployment process, e.g.: airflow connections

In airflow, is there a good way to call another dag's task?

試著忘記壹切 提交于 2019-12-04 04:27:32
问题 I've got dag_prime and dag_tertiary. dag_prime : Scans through a directory and intends to call dag_tertiary on each one. Currently a PythonOperator. dag_tertiary : Scans through the directory passed to it and does (possibly time-intensive) calculations on the contents thereof. I can call the secondary one from a system call from the python operator, but i feel like there's got to be a better way. I'd also like to consider queuing the dag_tertiary calls, if there's a simple way to do that. Is