airflow

Airflow psycopg2.OperationalError: FATAL: sorry, too many clients already

你离开我真会死。 提交于 2020-01-13 02:54:08
问题 I have a four node clustered Airflow environment that's been working fine for me for a few months now. ec2-instances Server 1: Webserver, Scheduler, Redis Queue, PostgreSQL Database Server 2: Webserver Server 3: Worker Server 4: Worker Recently I've been working on a more complex DAG that has a few dozen tasks in it compared to my relatively small ones I was working on beforehand. I'm not sure if that's why I'm just now seeing this error pop up or what but I'll sporadically get this error: On

How to test Apache Airflow tasks that uses XCom

柔情痞子 提交于 2020-01-13 02:09:08
问题 I'm trying to figure out a way to test a DAG where I have a couple of tasks communicating using XCom. Since the console command only allow me to run tasks from a DAG, is there a way to test the communication without having to run the DAG via the UI? Thanks 回答1: Here's a way that worked for me. Even though the Airflow web page states that the test command does not generate or keeps any state, running the airflow test command in sequence worked. Basically you do: Airflow test my_dag task1 date

Airflow : Passing a dynamic value to Sub DAG operator

拟墨画扇 提交于 2020-01-12 08:11:35
问题 I am new to Airflow. I have come across a scenario, where Parent DAG need to pass some dynamic number (let's say n ) to Sub DAG. Where as SubDAG will use this number to dynamically create n parallel tasks. Airflow documentation doesn't cover a way to achieve this. So I have explore couple of ways : Option - 1(Using xcom Pull) I have tried to pass as a xcom value, but for some reason SubDAG is not resolving to the passed value. Parent Dag File def load_dag(**kwargs): number_of_runs = json

Airflow : Passing a dynamic value to Sub DAG operator

你说的曾经没有我的故事 提交于 2020-01-12 08:08:07
问题 I am new to Airflow. I have come across a scenario, where Parent DAG need to pass some dynamic number (let's say n ) to Sub DAG. Where as SubDAG will use this number to dynamically create n parallel tasks. Airflow documentation doesn't cover a way to achieve this. So I have explore couple of ways : Option - 1(Using xcom Pull) I have tried to pass as a xcom value, but for some reason SubDAG is not resolving to the passed value. Parent Dag File def load_dag(**kwargs): number_of_runs = json

How to run Airflow PythonOperator in a virtual environment

做~自己de王妃 提交于 2020-01-12 05:02:33
问题 I have several python files that I'm currently executing using BashOperator. This allows me the flexibility to choose the python virtual environment easily. from airflow import DAG from airflow.operators.bash_operator import BashOperator default_args = { 'owner': 'airflow', 'depends_on_past': False, ...} dag = DAG('python_tasks', default_args=default_args, schedule_interval="23 4 * * *") t1 = BashOperator( task_id='task1', bash_command='~/anaconda3/envs/myenv/bin/python /python_files/python

Can't import setuptools

混江龙づ霸主 提交于 2020-01-11 13:43:11
问题 I have done nothing, and everything is broken. aviv$ python3 -c 'import setuptools' Traceback (most recent call last): File "/usr/lib/python3.5/pkgutil.py", line 407, in get_importer importer = sys.path_importer_cache[path_item] KeyError: '' This means that pip is broken and airflow is broken. Everything is broken. Please help. EDIT: It was suggested that this is a duplicate of this question: Python 3: ImportError "No Module named Setuptools". I'm doing a different thing and getting a

Airflow does not backfill latest run

 ̄綄美尐妖づ 提交于 2020-01-11 06:36:23
问题 For some reason, Airflow doesn't seem to trigger the latest run for a dag with a weekly schedule interval. Current Date: $ date $ Tue Aug 9 17:09:55 UTC 2016 DAG: from datetime import datetime from datetime import timedelta from airflow import DAG from airflow.operators.bash_operator import BashOperator dag = DAG( dag_id='superdag', start_date=datetime(2016, 7, 18), schedule_interval=timedelta(days=7), default_args={ 'owner': 'Jon Doe', 'depends_on_past': False } ) BashOperator( task_id=

Airflow connection UI not visible

给你一囗甜甜゛ 提交于 2020-01-11 05:45:27
问题 Airflow version: 1.10.2 Ubuntu: 18.04 (bionic) Python: 3.6.5 Issue: I am not sure how but the connections are not visible when I click Admin in the menu. Has someone ever faced this thing? When I edit the URL and go to localhost:8080/admin/connections I see the below response. This was working fine since But when I list the connections from airflow cli, it works. I am not sure why it is not visible on UI but rather accessible from cli? Or how should I give the UI user access to 'Connections'?

Apache Airflow DAG cannot import local module

浪子不回头ぞ 提交于 2020-01-11 04:27:25
问题 I do not seem to understand how to import modules into an apache airflow DAG definition file. I would want to do this to be able to create a library which makes declaring tasks with similar settings less verbose, for instance. Here is the simplest example I can think of that replicates the issue: I modified the airflow tutorial (https://airflow.apache.org/tutorial.html#recap) to simply import a module and run a definition from that module. Like so: Directory structure: - dags/ -- __init__.py

How to consider daylight savings time when using cron schedule in Airflow

心已入冬 提交于 2020-01-10 04:28:47
问题 In Airflow, I'd like a job to run at specific time each day in a non-UTC timezone. How can I go about scheduling this? The problem is that once daylight savings time is triggered, my job will either be running an hour too soon or an hour too late. In the Airflow docs, it seems like this is a known issue: In case you set a cron schedule, Airflow assumes you will always want to run at the exact same time. It will then ignore day light savings time. Thus, if you have a schedule that says run at