airflow

Airflow : ExternalTaskSensor doesn't trigger the task

北城余情 提交于 2019-11-29 15:25:01
I have already seen this and this questions on SO and made the changes accordingly. However, my dependent DAG still gets stuck in poking state. Below is my master DAG: from airflow import DAG from airflow.operators.jdbc_operator import JdbcOperator from datetime import datetime from airflow.operators.bash_operator import BashOperator today = datetime.today() default_args = { 'depends_on_past': False, 'retries': 0, 'start_date': datetime(today.year, today.month, today.day), 'schedule_interval': '@once' } dag = DAG('call-procedure-and-bash', default_args=default_args) call_procedure =

AttributeError: 'MSVCCompiler' object has no attribute 'linker_exe'

Deadly 提交于 2019-11-29 14:44:54
问题 I'm trying to install AirFlow, but keep getting an error. The line - pip install apache-airflow I installed Visual Studio with the proper packages, installed misaka, and updated both pip install version and setuptools. The results - Collecting apache-airflow Using cached https://files.pythonhosted.org/packages/fc/c9/db9c285b51a58c426433787205d86e91004662d99b1f5253295619bdb0e4/apache_airflow-1.10.4-py2.py3-none-any.whl Requirement already satisfied: future<0.17,>=0.16.0 in c:\users\ben\appdata

Airflow Logs BrokenPipeException

笑着哭i 提交于 2019-11-29 13:22:27
I'm using a clustered Airflow environment where I have four AWS ec2-instances for the servers. ec2-instances Server 1: Webserver, Scheduler, Redis Queue, PostgreSQL Database Server 2: Webserver Server 3: Worker Server 4: Worker My setup has been working perfectly fine for three months now but sporadically about once a week I get a Broken Pipe Exception when Airflow is attempting to log something. *** Log file isn't local. *** Fetching here: http://ip-1-2-3-4:8793/log/foobar/task_1/2018-07-13T00:00:00/1.log [2018-07-16 00:00:15,521] {cli.py:374} INFO - Running on host ip-1-2-3-4 [2018-07-16 00

apache-airflow 1.9 default timezone set to non utc

笑着哭i 提交于 2019-11-29 12:49:48
问题 I had recently upgraded airflow version from airflow 1.8 to apache-airflow 1.9, the upgrade was successful and I have scaled the environment using Celery Executor, everything seemed to be working fine but the dag and tasks start dates, execution dates etc all are appearing in UTC timezone and the scheduled dags are running in UTC, earlier before the upgrade they used to run in Local timezone which is pdt. Any ideas on how to make pdt as the default timezone in airflow? I have tried using

What is the difference between airflow trigger rule “all_done” and “all_success”?

自闭症网瘾萝莉.ら 提交于 2019-11-29 12:31:00
问题 One of the requirement in the workflow I am working on is to wait for some event to happen for given time, if it does not happen mark the task as failed still the downstream task should be executed. I am wondering if "all_done" means all the dependency tasks are done no matter if they have succeeded or not. 回答1: https://airflow.incubator.apache.org/concepts.html#trigger-rules all_done means all operations have finished working. Maybe they succeeded, maybe not. all_success means all operations

How to consider daylight savings time when using cron schedule in Airflow

南笙酒味 提交于 2019-11-29 11:57:00
In Airflow, I'd like a job to run at specific time each day in a non-UTC timezone. How can I go about scheduling this? The problem is that once daylight savings time is triggered, my job will either be running an hour too soon or an hour too late. In the Airflow docs , it seems like this is a known issue: In case you set a cron schedule, Airflow assumes you will always want to run at the exact same time. It will then ignore day light savings time. Thus, if you have a schedule that says run at end of interval every day at 08:00 GMT+1 it will always run end of interval 08:00 GMT+1, regardless if

Fusing operators together

我只是一个虾纸丫 提交于 2019-11-29 11:38:25
I'm still in the process of deploying Airflow and I've already felt the need to merge operator s together. The most common use-case would be coupling an operator and the corresponding sensor . For instance, one might want to chain together the EmrStepOperator and EmrStepSensor . I'm creating my DAG s programmatically , and the biggest one of those contains 150+ (identical) branches , each performing the same series of operations on different bits of data (tables). Therefore clubbing together tasks that make-up a single logical step in my DAG would be of great help. Here are 2 contending

Kubernetes deployment read-only filesystem error

余生颓废 提交于 2019-11-29 10:20:55
问题 I am facing an error while deploying Airflow on Kubernetes (precisely this version of Airflow https://github.com/puckel/docker-airflow/blob/1.8.1/Dockerfile) regarding writing permissions onto the filesystem. The error displayed on the logs of the pod is: sed: couldn't open temporary file /usr/local/airflow/sed18bPUH: Read-only file system sed: -e expression #1, char 131: unterminated `s' command sed: -e expression #1, char 118: unterminated `s' command Initialize database... sed: couldn't

BashOperator doen't run bash file apache airflow

谁说胖子不能爱 提交于 2019-11-29 04:21:50
I just started using apache airflow. I am trying to run test.sh file from airflow, however it is not work. Following is my code, file name is test.py import os from airflow import DAG from airflow.operators.bash_operator import BashOperator from datetime import datetime, timedelta default_args = { 'owner': 'airflow', 'depends_on_past': False, 'start_date': datetime(2015, 6, 1), 'email': ['airflow@airflow.com'], 'email_on_failure': False, 'email_on_retry': False, 'retries': 1, 'retry_delay': timedelta(minutes=5), # 'queue': 'bash_queue', # 'pool': 'backfill', # 'priority_weight': 10, # 'end

Airflow tasks get stuck at “queued” status and never gets running

假装没事ソ 提交于 2019-11-29 03:08:51
I'm using Airflow v1.8.1 and run all components (worker, web, flower, scheduler) on kubernetes & Docker. I use Celery Executor with Redis and my tasks are looks like: (start) -> (do_work_for_product1) ├ -> (do_work_for_product2) ├ -> (do_work_for_product3) ├ … So the start task has multiple downstreams. And I setup concurrency related configuration as below: parallelism = 3 dag_concurrency = 3 max_active_runs = 1 Then when I run this DAG manually (not sure if it never happens on a scheduled task) , some downstreams get executed, but others stuck at "queued" status. If I clear the task from