airflow

How can I control the parallelism or concurrency of an Airflow DAG?

这一生的挚爱 提交于 2019-12-18 11:36:17
问题 In some of my Airflow installations, DAGs or tasks that are scheduled to run do not run even when the scheduler is not fully loaded. How can I increase the number of DAGs or tasks that can run concurrently? Similarly, if my installation is under high load and I want to limit how quickly my Airflow workers pull queued tasks, what can I adjust? 回答1: Here's an expanded list of configuration options that are available in Airflow v1.10.2. Some can be set on a per-DAG or per-operator basis, and may

How to restart a failed task on Airflow

别说谁变了你拦得住时间么 提交于 2019-12-18 10:50:11
问题 I am using a LocalExecutor and my dag has 3 tasks where task(C) is dependant on task(A). Task(B) and task(A) can run in parallel something like below A-->C B So task(A) has failed and but task(B) ran fine . Task(C) is yet to run as task(A) has failed. My question is how do i re run Task(A) alone so Task(C) runs once Task(A) completes and Airflow UI marks them as success. 回答1: In the UI: Go to the dag, and dag run of the run you want to change Click on GraphView Click on task A Click "Clear"

Problem updating the connections in Airflow programatically

安稳与你 提交于 2019-12-18 09:55:06
问题 I am trying to update the Airflow connections using python. I have created a python function that takes an authentication token from API and updates the extra field of connection in Airflow. I am getting token in json format like below: { "token" : token_value } Below is the part of python code that I am using def set_token(): # Get token from API & update the Airflow Variables Variable.set("token", str(auth_token)) new_token = Variables.get("token") get_conn = Connection(conn_id="test_conn")

Fusing operators together

浪子不回头ぞ 提交于 2019-12-18 06:52:42
问题 I'm still in the process of deploying Airflow and I've already felt the need to merge operator s together. The most common use-case would be coupling an operator and the corresponding sensor . For instance, one might want to chain together the EmrStepOperator and EmrStepSensor . I'm creating my DAG s programmatically, and the biggest one of those contains 150+ (identical) branches , each performing the same series of operations on different bits of data (tables). Therefore clubbing together

Email on failure using AWS SES in Apache Airflow DAG

笑着哭i 提交于 2019-12-18 05:22:08
问题 I am trying to have Airflow email me using AWS SES whenever a task in my DAG fails to run or retries to run. I am using my AWS SES credentials rather than my general AWS credentials too. My current airflow.cfg [email] email_backend = airflow.utils.email.send_email_smtp [smtp] # If you want airflow to send emails on retries, failure, and you want to use # the airflow.utils.email.send_email_smtp function, you have to configure an # smtp server here smtp_host = emailsmtpserver.region.amazonaws

How to run bash script file in Airflow

你离开我真会死。 提交于 2019-12-17 23:18:07
问题 I have a bash script that creates a file (if it does not exist) that I want to run in Airflow, but when I try it fails. How do I do this? #!/bin/bash #create_file.sh file=filename.txt if [ ! -e "$file" ] ; then touch "$file" fi if [ ! -w "$file" ] ; then echo cannot write to $file exit 1 fi and here's how I'm calling it in Airflow: create_command = """ ./scripts/create_file.sh """ t1 = BashOperator( task_id= 'create_file', bash_command=create_command, dag=dag ) lib/python2.7/site-packages

How to run Spark code in Airflow?

梦想的初衷 提交于 2019-12-17 22:06:06
问题 Hello people of the Earth! I'm using Airflow to schedule and run Spark tasks. All I found by this time is python DAGs that Airflow can manage. DAG example: spark_count_lines.py import logging from airflow import DAG from airflow.operators import PythonOperator from datetime import datetime args = { 'owner': 'airflow' , 'start_date': datetime(2016, 4, 17) , 'provide_context': True } dag = DAG( 'spark_count_lines' , start_date = datetime(2016, 4, 17) , schedule_interval = '@hourly' , default

DAGs not clickable on Google Cloud Composer webserver, but working fine on a local Airflow

左心房为你撑大大i 提交于 2019-12-17 19:51:31
问题 I'm using Google Cloud Composer (managed Airflow on Google Cloud Platform) with image version composer-0.5.3-airflow-1.9.0 and Python 2.7, and I'm facing a weird issue : after importing my DAGs, they are not clickable from the Web UI (and there are no buttons "Trigger DAG", "Graph view", ...), while all works perfectly when running a local Airflow. Even if non usable from the webserver on Composer, my DAGs still exist. I can list them using CLI ( list_dags ), describe them ( list_tasks ) and

Make custom Airflow macros expand other macros

╄→гoц情女王★ 提交于 2019-12-17 18:43:12
问题 Is there any way to make a user-defined macro in Airflow which is itself computed from other macros? from airflow import DAG from airflow.operators.bash_operator import BashOperator dag = DAG( 'simple', schedule_interval='0 21 * * *', user_defined_macros={ 'next_execution_date': '{{ dag.following_schedule(execution_date) }}', }, ) task = BashOperator( task_id='bash_op', bash_command='echo "{{ next_execution_date }}"', dag=dag, ) The use case here is to back-port the new Airflow v1.8 next

Python Airflow - Return result from PythonOperator

天大地大妈咪最大 提交于 2019-12-17 17:59:06
问题 I have written a DAG with multiple PythonOperators task1 = af_op.PythonOperator(task_id='Data_Extraction_Environment', provide_context=True, python_callable=Task1, dag=dag1) def Task1(**kwargs): return(kwargs['dag_run'].conf.get('file')) From PythonOperator i am calling "Task1" method. That method is returning a value,that value i need to pass to the next PythonOperator.How can i get the value from the "task1" variable or How can i get the value which is returned from Task1 method? updated :