airflow

Airflow External sensor gets stuck at poking

为君一笑 提交于 2019-12-01 15:13:46
问题 I want one dag starts after completion of another dag. one solution is using external sensor function, below you can find my solution. the problem I encounter is that the dependent dag is stuck at poking, I checked this answer and made sure that both of the dags runs on the same schedule, my simplified code is as follows: any help would be appreciated. leader dag: from airflow import DAG from airflow.operators.bash_operator import BashOperator from datetime import datetime, timedelta default

Which version of MySQL is compatible with Airflow version 1.10?

不想你离开。 提交于 2019-12-01 14:32:59
I am trying to use LocalExecutor instead of the default SequentialExecutor which forces to use a different database then SQLlite. I wanted to try MySQL, however I am seeing issues with MySQL version 5.6, 5.7? Not sure if it is related to version compatibility. Would love to see any documentation related to Airflow versions and compatible MySQL versions. Update: Here is the Ooops error I am seeing in th UI when click on any of the DAG related buttons while using MySQL backend: Traceback (most recent call last): File "/home/ec2-user/.local/lib/python2.7/site-packages/flask/app.py", line 1982, in

Airflow run tasks at different times in the same dag?

一曲冷凌霜 提交于 2019-12-01 11:23:33
I have 30 individual tasks in a dag, they have no dependencies between each other. The tasks run the same code. The only difference is the data volume, some tasks will finish in secs, some tasks will take 2 hours or more. The problem is during catchup, the tasks that finish in secs are blocked by tasks that take hours to finish before they move on to the next execution date. I can break them up into individual dags but that seems silly and 30 tasks will grow to a bigger number in the future. Is there any way to run tasks in the same dag at different execution times? Like as soon as a task

Airflow does not backfill latest run

不羁岁月 提交于 2019-12-01 11:20:10
For some reason, Airflow doesn't seem to trigger the latest run for a dag with a weekly schedule interval. Current Date: $ date $ Tue Aug 9 17:09:55 UTC 2016 DAG: from datetime import datetime from datetime import timedelta from airflow import DAG from airflow.operators.bash_operator import BashOperator dag = DAG( dag_id='superdag', start_date=datetime(2016, 7, 18), schedule_interval=timedelta(days=7), default_args={ 'owner': 'Jon Doe', 'depends_on_past': False } ) BashOperator( task_id='print_date', bash_command='date', dag=dag ) Run scheduler $ airflow scheduler -d superdag You'd expect a

Tasks added to DAG during runtime fail to be scheduled

非 Y 不嫁゛ 提交于 2019-12-01 10:37:51
My idea is to have a task foo which generates a list of inputs (users, reports, log files, etc), and a task is launched for every element in the input list. The goal is to make use of Airflow's retrying and other logic, instead of reimplementing it. So, ideally, my DAG should look something like this: The only variable here is the number of tasks generated. I want to do some more tasks after all of these are completed, so spinning up a new DAG for every task does not seem appropriate. This is my code: default_args = { 'owner': 'airflow', 'depends_on_past': False, 'start_date': datetime(2015, 6

airflow Operators

你。 提交于 2019-12-01 09:38:31
airflow Operators 20190927 一、 Dag 编写步骤 import DAG 类和若干operater类以及必要的Python模块 设定默认参数,创建 DAG 对象 提供必要的参数(比如task_id和dag),创建 Task (即Operator对象) 设定 Task 的上下游依赖关系 1. import DAG类 import airflow from airflow import DAG from airflow.operators.bash_operator import BashOperator from datetime import timedelta 2. 设置一些默认参数 所有的 Operator 都是从 BaseOperator 派生而来,并通过继承获得更多功能 参考【 airflow operators-CSDN 】 default_args 设置的是DAG的通用参数,这些通用参数会直接传递给DAG下属的所有Task,这些参数也可以在创建Task时传入 default_args = { # 常用 'owner': 'airflow', # 这个DAG的所有者,会在Web UI上显示,主要用于方便管理 'depends_on_past': False, # 是否依赖于过去。如果为True,那么必须要昨天的DAG执行成功了

Airflow Python Script with execution_date in op_kwargs

佐手、 提交于 2019-12-01 09:34:17
With assistance from this answer https://stackoverflow.com/a/41730510/4200352 I am executing a python file. I use PythonOperator and am trying to include the execution date as an argument passed to the script. I believe I can access it somehow through kwargs['execution_date']. The below fails DAG.py from airflow import DAG from airflow.operators.python_operator import PythonOperator from datetime import datetime, timedelta import sys import os sys.path.append(os.path.abspath("/home/glsam/OmegaAPI/airflow/scripts/PyPer_ogi_simple")) from update_benchmarks import * default_args = { 'owner':

Airflow run tasks at different times in the same dag?

空扰寡人 提交于 2019-12-01 08:22:27
问题 I have 30 individual tasks in a dag, they have no dependencies between each other. The tasks run the same code. The only difference is the data volume, some tasks will finish in secs, some tasks will take 2 hours or more. The problem is during catchup, the tasks that finish in secs are blocked by tasks that take hours to finish before they move on to the next execution date. I can break them up into individual dags but that seems silly and 30 tasks will grow to a bigger number in the future.

Tasks added to DAG during runtime fail to be scheduled

徘徊边缘 提交于 2019-12-01 08:11:28
问题 My idea is to have a task foo which generates a list of inputs (users, reports, log files, etc), and a task is launched for every element in the input list. The goal is to make use of Airflow's retrying and other logic, instead of reimplementing it. So, ideally, my DAG should look something like this: The only variable here is the number of tasks generated. I want to do some more tasks after all of these are completed, so spinning up a new DAG for every task does not seem appropriate. This is