airflow

How to automatically reschedule airflow tasks

让人想犯罪 __ 提交于 2021-02-20 02:08:20
问题 I am running an hourly process that picks up data from one location ("origin") and moves it to another ("destination"). for the most part, the data arrives to my origin at specific time and everything works fine, but there can be delays and when that happens, the task in airflow fails and need to be manually re-run. One way to solve this is to give more time for the data to arrive, but I prefer to do that only if there is in fact a delay. Also, I wouldn't want to have a sensor that is waiting

How to automatically reschedule airflow tasks

纵饮孤独 提交于 2021-02-20 02:01:27
问题 I am running an hourly process that picks up data from one location ("origin") and moves it to another ("destination"). for the most part, the data arrives to my origin at specific time and everything works fine, but there can be delays and when that happens, the task in airflow fails and need to be manually re-run. One way to solve this is to give more time for the data to arrive, but I prefer to do that only if there is in fact a delay. Also, I wouldn't want to have a sensor that is waiting

How to automatically reschedule airflow tasks

為{幸葍}努か 提交于 2021-02-20 02:00:31
问题 I am running an hourly process that picks up data from one location ("origin") and moves it to another ("destination"). for the most part, the data arrives to my origin at specific time and everything works fine, but there can be delays and when that happens, the task in airflow fails and need to be manually re-run. One way to solve this is to give more time for the data to arrive, but I prefer to do that only if there is in fact a delay. Also, I wouldn't want to have a sensor that is waiting

Airflow issue with branching tasks

不问归期 提交于 2021-02-19 07:34:42
问题 I am trying to setup a DAG where a task is run every minute, and then another task is run on the 5th minute (right before the 1 minute task). It's really just testing, I am not planning to run jobs in such short intervals. Visually, my DAG looks like this: And the code itself like this: from airflow import DAG from airflow.operators.bash_operator import BashOperator from airflow.operators.python_operator import BranchPythonOperator from datetime import datetime, timedelta default_args = {

Airflow issue with branching tasks

守給你的承諾、 提交于 2021-02-19 07:34:28
问题 I am trying to setup a DAG where a task is run every minute, and then another task is run on the 5th minute (right before the 1 minute task). It's really just testing, I am not planning to run jobs in such short intervals. Visually, my DAG looks like this: And the code itself like this: from airflow import DAG from airflow.operators.bash_operator import BashOperator from airflow.operators.python_operator import BranchPythonOperator from datetime import datetime, timedelta default_args = {

CeleryExecutor in Airflow are not parallelizing tasks in a subdag

喜你入骨 提交于 2021-02-19 02:33:47
问题 We're using Airflow:1.10.0 and after some analysis why some of our ETL processes are taking so long we saw that the subdags are using a SequentialExecutor instead to use BaseExecutor or when we configure the CeleryExecutor . I would like to know if this is a bug or an expected behavior of Airflow. Doesn't make any sense have some capability to execute tasks in parallel but in some specific kind of task, this capability is lost. 回答1: It is a typical pattern to use the SequentialExecutor in

CeleryExecutor in Airflow are not parallelizing tasks in a subdag

浪子不回头ぞ 提交于 2021-02-19 02:33:09
问题 We're using Airflow:1.10.0 and after some analysis why some of our ETL processes are taking so long we saw that the subdags are using a SequentialExecutor instead to use BaseExecutor or when we configure the CeleryExecutor . I would like to know if this is a bug or an expected behavior of Airflow. Doesn't make any sense have some capability to execute tasks in parallel but in some specific kind of task, this capability is lost. 回答1: It is a typical pattern to use the SequentialExecutor in

Airflow worker stuck : Task is in the 'running' state which is not a valid state for execution. The task must be cleared in order to be run

蓝咒 提交于 2021-02-18 22:45:46
问题 Airflow tasks run w/o any issues and suddenly half the way it gets stuck and the task instance details say above message. I cleared my entire database, but still, I am getting the same error. The fact is I am getting this issue for only some dags. Mostly when the long-running jobs. I am getting below error [2019-07-03 12:14:56,337] {{models.py:1353}} INFO - Dependencies not met for <TaskInstance: XXXXXX.index_to_es 2019-07-01T13:30:00+00:00 [running]>, dependency 'Task Instance State' FAILED:

Airflow worker stuck : Task is in the 'running' state which is not a valid state for execution. The task must be cleared in order to be run

泄露秘密 提交于 2021-02-18 22:41:32
问题 Airflow tasks run w/o any issues and suddenly half the way it gets stuck and the task instance details say above message. I cleared my entire database, but still, I am getting the same error. The fact is I am getting this issue for only some dags. Mostly when the long-running jobs. I am getting below error [2019-07-03 12:14:56,337] {{models.py:1353}} INFO - Dependencies not met for <TaskInstance: XXXXXX.index_to_es 2019-07-01T13:30:00+00:00 [running]>, dependency 'Task Instance State' FAILED:

Is execution_date the date of the DAG run or the Task run?

余生长醉 提交于 2021-02-17 06:38:06
问题 Is the value of execution_date the time date/time when the DAG ran--and is it the same value for all of its Tasks--or is execution_date (potentially) different per Task within a DAG? 回答1: The execution_date is the start of the interval for the run. All tasks have the same execution_date value as their run. It's how they're associated with a run in the code. Think of it like this: If you ran a process quarterly and generated a report from data for that quarter, would you name the report for