airflow-scheduler

How to automatically reschedule airflow tasks

让人想犯罪 __ 提交于 2021-02-20 02:08:20
问题 I am running an hourly process that picks up data from one location ("origin") and moves it to another ("destination"). for the most part, the data arrives to my origin at specific time and everything works fine, but there can be delays and when that happens, the task in airflow fails and need to be manually re-run. One way to solve this is to give more time for the data to arrive, but I prefer to do that only if there is in fact a delay. Also, I wouldn't want to have a sensor that is waiting

How to automatically reschedule airflow tasks

纵饮孤独 提交于 2021-02-20 02:01:27
问题 I am running an hourly process that picks up data from one location ("origin") and moves it to another ("destination"). for the most part, the data arrives to my origin at specific time and everything works fine, but there can be delays and when that happens, the task in airflow fails and need to be manually re-run. One way to solve this is to give more time for the data to arrive, but I prefer to do that only if there is in fact a delay. Also, I wouldn't want to have a sensor that is waiting

How to automatically reschedule airflow tasks

為{幸葍}努か 提交于 2021-02-20 02:00:31
问题 I am running an hourly process that picks up data from one location ("origin") and moves it to another ("destination"). for the most part, the data arrives to my origin at specific time and everything works fine, but there can be delays and when that happens, the task in airflow fails and need to be manually re-run. One way to solve this is to give more time for the data to arrive, but I prefer to do that only if there is in fact a delay. Also, I wouldn't want to have a sensor that is waiting

CeleryExecutor in Airflow are not parallelizing tasks in a subdag

喜你入骨 提交于 2021-02-19 02:33:47
问题 We're using Airflow:1.10.0 and after some analysis why some of our ETL processes are taking so long we saw that the subdags are using a SequentialExecutor instead to use BaseExecutor or when we configure the CeleryExecutor . I would like to know if this is a bug or an expected behavior of Airflow. Doesn't make any sense have some capability to execute tasks in parallel but in some specific kind of task, this capability is lost. 回答1: It is a typical pattern to use the SequentialExecutor in

CeleryExecutor in Airflow are not parallelizing tasks in a subdag

浪子不回头ぞ 提交于 2021-02-19 02:33:09
问题 We're using Airflow:1.10.0 and after some analysis why some of our ETL processes are taking so long we saw that the subdags are using a SequentialExecutor instead to use BaseExecutor or when we configure the CeleryExecutor . I would like to know if this is a bug or an expected behavior of Airflow. Doesn't make any sense have some capability to execute tasks in parallel but in some specific kind of task, this capability is lost. 回答1: It is a typical pattern to use the SequentialExecutor in

Airflow worker stuck : Task is in the 'running' state which is not a valid state for execution. The task must be cleared in order to be run

蓝咒 提交于 2021-02-18 22:45:46
问题 Airflow tasks run w/o any issues and suddenly half the way it gets stuck and the task instance details say above message. I cleared my entire database, but still, I am getting the same error. The fact is I am getting this issue for only some dags. Mostly when the long-running jobs. I am getting below error [2019-07-03 12:14:56,337] {{models.py:1353}} INFO - Dependencies not met for <TaskInstance: XXXXXX.index_to_es 2019-07-01T13:30:00+00:00 [running]>, dependency 'Task Instance State' FAILED:

Airflow worker stuck : Task is in the 'running' state which is not a valid state for execution. The task must be cleared in order to be run

泄露秘密 提交于 2021-02-18 22:41:32
问题 Airflow tasks run w/o any issues and suddenly half the way it gets stuck and the task instance details say above message. I cleared my entire database, but still, I am getting the same error. The fact is I am getting this issue for only some dags. Mostly when the long-running jobs. I am getting below error [2019-07-03 12:14:56,337] {{models.py:1353}} INFO - Dependencies not met for <TaskInstance: XXXXXX.index_to_es 2019-07-01T13:30:00+00:00 [running]>, dependency 'Task Instance State' FAILED:

Is execution_date the date of the DAG run or the Task run?

余生长醉 提交于 2021-02-17 06:38:06
问题 Is the value of execution_date the time date/time when the DAG ran--and is it the same value for all of its Tasks--or is execution_date (potentially) different per Task within a DAG? 回答1: The execution_date is the start of the interval for the run. All tasks have the same execution_date value as their run. It's how they're associated with a run in the code. Think of it like this: If you ran a process quarterly and generated a report from data for that quarter, would you name the report for

Airflow (Google Composer) TypeError: can't pickle _thread.RLock objects

狂风中的少年 提交于 2021-02-11 14:29:54
问题 I'm using airflow(Google composer), but experienced some exceptions below TypeError: can't pickle _thread.RLock objects Ooops. ____/ ( ( ) ) \___ /( ( ( ) _ )) ) )\ (( ( )( ) ) ( ) ) ((/ ( _( ) ( _) ) ( () ) ) ( ( ( (_) (( ( ) .((_ ) . )_ ( ( ) ( ( ) ) ) . ) ( ) ( ( ( ( ) ( _ ( _) ). ) . ) ) ( ) ( ( ( ) ( ) ( )) ) _)( ) ) ) ( ( ( \ ) ( (_ ( ) ( ) ) ) ) )) ( ) ( ( ( ( (_ ( ) ( _ ) ) ( ) ) ) ( ( ( ( ( ) (_ ) ) ) _) ) _( ( ) (( ( )( ( _ ) _) _(_ ( (_ ) (_((__(_(__(( ( ( | ) ) ) )_))__))_)___) ((

Airflow 1.9.0 ExternalTaskSensor retry_delay=30 yields TypeError: can't pickle _thread.RLock objects

故事扮演 提交于 2021-02-10 18:13:06
问题 As the titles says; in Airflow 1.9.0 if you use the retry_delay=30 (or any other number) parameter with the ExternalTaskSensor, the DAG will run just fine, until you want to clear the task instances in the airflow GUI -> it will return the following error: "TypeError: can't pickle _thread.RLock objects" (and a nice Oops message) But if you use retry_delay=timedelta(seconds=30) clearing task instances works fine. If I look through the models.py method, the deepcopy should go fine, so it seems