airflow

Airflow s3 connection using UI

谁都会走 提交于 2019-11-27 09:41:08
问题 I've been trying to use Airflow to schedule a DAG. One of the DAG includes a task which loads data from s3 bucket. For the purpose above I need to setup s3 connection. But UI provided by airflow isn't that intutive (http://pythonhosted.org/airflow/configuration.html?highlight=connection#connections). Any one succeeded setting up the s3 connection if so are there any best practices you folks follow? Thanks. 回答1: It's hard to find references, but after digging a bit I was able to make it work.

Create and use Connections in Airflow operator at runtime [duplicate]

心不动则不痛 提交于 2019-11-27 09:38:54
This question already has an answer here: Is there a way to create/modify connections through Airflow API 2 answers Note: This is NOT a duplicate of Export environment variables at runtime with airflow Set Airflow Env Vars at Runtime I have to trigger certain tasks at remote systems from my Airflow DAG . The straight-forward way to achieve this is SSHHook . The problem is that the remote system is an EMR cluster which is itself created at runtime (by an upstream task ) using EmrCreateJobFlowOperator . So while I can get hold of job_flow_id of the launched EMR cluster ( using XCOM ), what I

How to use non-installable modules from DAG code?

試著忘記壹切 提交于 2019-11-27 08:26:29
问题 I have a Git repository which (among other things) holds Airflow DAGs in airflow directory. I have a clone of the repository besides an install directory of Airflow. airflow directory in Git is pointed to by AIRFLOW_HOME configuration variable. I would like to allow imports from modules in the repository that are listed outside airflow folder (please see the structure below). <repo root> |_airflow |_dags |_dag.py |_module1 |_module2 |_... So that in dag.py I can do: from module1 import

Apache Airflow scheduler does not trigger DAG at schedule time

落花浮王杯 提交于 2019-11-27 08:02:49
问题 When I schedule DAGs to run at a specific time everyday, the DAG execution does not take place at all. However, when I restart Airflow webserver and scheduler, the DAGs execute once on the scheduled time for that particular day and do not execute from the next day onwards. I am using Airflow version v1.7.1.3 with python 2.7.6. Here goes the DAG code: from airflow import DAG from airflow.operators.bash_operator import BashOperator from datetime import datetime, timedelta import time n=time

execution_date in airflow: need to access as a variable

孤街浪徒 提交于 2019-11-27 05:33:36
问题 I am really a newbie in this forum. But I have been playing with airflow, for sometime, for our company. Sorry if this question sounds really dumb. I am writing a pipeline using bunch of BashOperators. Basically, for each Task, I want to simply call a REST api using 'curl' This is what my pipeline looks like(very simplified version): from airflow import DAG from airflow.operators import BashOperator, PythonOperator from dateutil import tz import datetime datetime_obj = datetime.datetime

setting up s3 for logs in airflow

旧时模样 提交于 2019-11-27 04:29:05
I am using docker-compose to set up a scalable airflow cluster. I based my approach off of this Dockerfile https://hub.docker.com/r/puckel/docker-airflow/ My problem is getting the logs set up to write/read from s3. When a dag has completed I get an error like this *** Log file isn't local. *** Fetching here: http://ea43d4d49f35:8793/log/xxxxxxx/2017-06-26T11:00:00 *** Failed to fetch log file from worker. *** Reading remote logs... Could not read logs from s3://buckets/xxxxxxx/airflow/logs/xxxxxxx/2017-06- 26T11:00:00 I set up a new section in the airflow.cfg file like this [MyS3Conn] aws

Airflow ExternalTaskSensor gets stuck

99封情书 提交于 2019-11-27 03:29:59
问题 I'm trying to use ExternalTaskSensor and it gets stuck at poking another DAG's task, which has already been successfully completed. Here, a first DAG "a" completes its task and after that a second DAG "b" through ExternalTaskSensor is supposed to be triggered. Instead it gets stuck at poking for a.first_task. First DAG: import datetime from airflow import DAG from airflow.operators.python_operator import PythonOperator dag = DAG( dag_id='a', default_args={'owner': 'airflow', 'start_date':

Airflow: how to delete a DAG?

只谈情不闲聊 提交于 2019-11-27 03:10:56
I have started the Airflow webserver and scheduled some dags. I can see the dags on web GUI. How can I delete a particular DAG from being run and shown in web GUI? Is there an Airflow CLI command to do that? I looked around but could not find an answer for a simple way of deleting a DAG once it has been loaded and scheduled. Edit 8/27/18 - Airflow 1.10 is now released on PyPI! https://pypi.org/project/apache-airflow/1.10.0/ How to delete a DAG completely We have this feature now in Airflow ≥ 1.10! The PR #2199 (Jira: AIRFLOW-1002 ) adding DAG removal to Airflow has now been merged which allows

How to run Airflow on Windows

戏子无情 提交于 2019-11-27 01:52:36
问题 The usual instructions for running Airflow do not apply on a Windows environment: # airflow needs a home, ~/airflow is the default, # but you can lay foundation somewhere else if you prefer # (optional) export AIRFLOW_HOME=~/airflow # install from pypi using pip pip install airflow # initialize the database airflow initdb # start the web server, default port is 8080 airflow webserver -p 8080 The Airflow utility is not available in the command line and I can't find it elsewhere to be manually