airflow

Running Job On Airflow Based On Webrequest

我与影子孤独终老i 提交于 2019-12-03 09:57:18
I wanted to know if airflow tasks can be executed upon getting a request over HTTP. I am not interested in the scheduling part of Airflow. I just want to use it as a substitute for Celery. So an example operation would be something like this. User submits a form requesting for some report. Backend receives the request and sends the user a notification that the request has been received. The backend then schedules a job using Airflow to run immediately. Airflow then executes a series of tasks associated with a DAG. For example, pull data from redshift first, pull data from MySQL, make some

Is it possible for Airflow scheduler to first finish the previous day's cycle before starting the next?

别来无恙 提交于 2019-12-03 09:23:07
问题 Right now, nodes in my DAG proceeds to the next day's task before the rest of the nodes of that DAG finishes. Is there a way for it to wait for the rest of the DAG to finish before moving unto the next day's DAG cycle? (I do have depends_on_past as true, but that does not work in this case) My DAG looks like this: O l V O -> O -> O -> O -> O Also, tree view pic of the dag] 回答1: Might be a bit late for this answer, but I ran into the same issue and the way I resolved it is I added two extra

Airflow depends_on_past for whole DAG

匿名 (未验证) 提交于 2019-12-03 09:13:36
可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效,请关闭广告屏蔽插件后再试): 问题: Is there a way in airflow of using the depends_on_past for an entire DagRun, not just applied to a Task? I have a daily DAG, and the Friday DagRun errored on the 4th task however the Saturday and Sunday DagRuns still ran as scheduled. Using depends_on_past = True would have paused the DagRun on the same 4th task, however the first 3 tasks would still have run. I can see in the DagRun DB table there is a state column that contains failed for the Friday DagRun. What I want is a way configuring a DagRun to not start if the previous DagRun

How to access the response from Airflow SimpleHttpOperator GET request

匿名 (未验证) 提交于 2019-12-03 09:06:55
可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效,请关闭广告屏蔽插件后再试): 问题: I'm learning Airflow and have a simple quesiton. Below is my DAG called dog_retriever import airflow from airflow import DAG from airflow.operators.http_operator import SimpleHttpOperator from airflow.operators.sensors import HttpSensor from datetime import datetime, timedelta import json default_args = { 'owner': 'Loftium', 'depends_on_past': False, 'start_date': datetime(2017, 10, 9), 'email': 'rachel@loftium.com', 'email_on_failure': False, 'email_on_retry': False, 'retries': 3, 'retry_delay': timedelta(minutes=3), } dag = DAG('dog

Apache Airflow or Apache Beam for data processing and job scheduling

只愿长相守 提交于 2019-12-03 08:28:02
问题 I'm trying to give useful information but I am far from being a data engineer. I am currently using the python library pandas to execute a long series of transformation to my data which has a lot of inputs (currently CSV and excel files). The outputs are several excel files. I would like to be able to execute scheduled monitored batch jobs with parallel computation (I mean not as sequential as what I'm doing with pandas), once a month. I don't really know Beam or Airflow, I quickly read

Airflow user creation

风格不统一 提交于 2019-12-03 08:06:47
I am using Airflow version 1.8.2 and set up couple of Dags.Everything running as expected .I have admin user created for airflow web server access.But for other teams to monitor their job we can't provide this admin user So I tried to create a different user from the UI '/admin/user/'. But only the following fields are available .No options to provide roles or password etc. Does anyone faced the same issue or I am doing some wrong thing .How to create role based users So that I can tag some specific dags to those teams Thanks As of Airflow 1.10, there is an airflow create_user CLI: https:/

Unable to execute spark job using SparkSubmitOperator

不想你离开。 提交于 2019-12-03 07:33:53
I am able to run Spark job using BashOperator but I want to use SparkSubmitOperator for it using Spark standalone mode . Here's my DAG for SparkSubmitOperator and stack-trace args = { 'owner': 'airflow', 'start_date': datetime(2018, 5, 24) } dag = DAG('spark_job', default_args=args, schedule_interval="*/10 * * * *") operator = SparkSubmitOperator( task_id='spark_submit_job', application='/home/ubuntu/test.py', total_executor_cores='1', executor_cores='1', executor_memory='2g', num_executors='1', name='airflow-spark', verbose=False, driver_memory='1g', conf={'master':'spark://xx.xx.xx.xx:7077'}

How to allow airflow dags for concrete user(s) only

我只是一个虾纸丫 提交于 2019-12-03 07:25:28
The problem is pretty simple. I need to limit airflow web users to see and execute only certain DAGs and tasks. If possible, I'd prefer not to use Kerberos nor OAuth . The Multi-tenancy option seems like an option to go, but couldn't make it work the way I expect. My current setup: added airflow web users test and ikar via Web Authentication / Password my unix username is ikar with a home in /home/ikar no test unix user airflow 1.8.2 is installed in /home/ikar/airflow added two DAGs with one task: one with owner set to ikar one with owner set to test cat airflow.cfg : [core] # The home folder

How to Connect Airflow to oracle database

耗尽温柔 提交于 2019-12-03 07:13:28
I am trying to create a connection to an oracle db instance (oracle:thin) using Airflow. According to their documentation I entered my hostname followed by port number and SID: Host: example.com:1524/sid filled other fields as: Conn Type : Oracle Schema : username ( documentation says: use your username for schema ) Login : username Password : * * * After connection is setup, it gives the save error code for every query that I tried to execute ( ORA-12514 ). It seems like oracle doesn't let airflow to connect: ORA-12514: TNS:listener does not currently know of service requested in connect

Accessing configuration parameters passed to Airflow through CLI

你说的曾经没有我的故事 提交于 2019-12-03 06:10:09
问题 I am trying to pass the following configuration parameters to Airflow CLI while triggering a dag run. Following is the trigger_dag command I am using. airflow trigger_dag -c '{"account_list":"[1,2,3,4,5]", "start_date":"2016-04-25"}' insights_assembly_9900 My problem is that how can I access the con parameters passed inside an operator in the dag run. 回答1: This is probably a continuation of the answer provided by devj . At airflow.cfg the following property should be set to true: dag_run_conf