airflow

Airflow Generate Dynamic Tasks in Single DAG , Task N+1 is Dependent on TaskN

浪尽此生 提交于 2020-07-05 03:27:05
问题 When generating tasks dynamically, I need to have Task 2 be dependent of Task 1, Task1 >> Task 2 or task2.set_upstream(task1). Since the task_ids are evaluated, or seem to be upfront, I cannot set the dependency in advance, any help would be appreciated. The Component(I) tasks generate fine, except that they all run at once. for i in range(1,10): task_id='Component'+str(i) task_id = BashOperator( task_id='Component'+str(i), bash_command="echo {{ ti.xcom_pull task_ids='SomeOtherTaskXcom', key=

Airflow Generate Dynamic Tasks in Single DAG , Task N+1 is Dependent on TaskN

痴心易碎 提交于 2020-07-05 03:26:05
问题 When generating tasks dynamically, I need to have Task 2 be dependent of Task 1, Task1 >> Task 2 or task2.set_upstream(task1). Since the task_ids are evaluated, or seem to be upfront, I cannot set the dependency in advance, any help would be appreciated. The Component(I) tasks generate fine, except that they all run at once. for i in range(1,10): task_id='Component'+str(i) task_id = BashOperator( task_id='Component'+str(i), bash_command="echo {{ ti.xcom_pull task_ids='SomeOtherTaskXcom', key=

Airflow : dag run with execution_date = trigger_date = fixed_schedule

纵然是瞬间 提交于 2020-07-04 13:10:46
问题 in airflow, I would like to run a dag each monday at 8am (the execution_date should be of course "current day monday 8 am"). The relevant parameters to set up for this workflow are : start_date : "2018-03-19" schedule_interval : "0 8 * * MON" I expect to see a dag run every monday at 8am . The first one being run the 19-03-2018 at 8 am with execution_date = 2018-03-19-08-00-00 and so on each monday. However it's not what happens : the dag is not started on 19/03/18 at 8 am. The real behaviour

Airflow - Initiation of DB stuck in SQL Server

大憨熊 提交于 2020-06-29 06:17:48
问题 Trying to setup airflow using SQL Server as the backend, but get stuck during the initdb command: user@computer /my/home> airflow initdb [2019-09-13 12:10:04,375] {__init__.py:51} INFO - Using executor SequentialExecutor DB: mssql+pymssql://TestServiceUser:***@my_sql_Server:1433/airflow [2019-09-13 12:10:05,101] {db.py:369} INFO - Creating tables INFO [alembic.runtime.migration] Context impl MSSQLImpl. INFO [alembic.runtime.migration] Will assume transactional DDL. INFO [alembic.runtime

Schedule a DAG in airflow to run for every 5 minutes , starting from today i.e., 2019-12-18

只谈情不闲聊 提交于 2020-06-29 06:00:54
问题 I am trying to run a DAG for every 5 minutes starting from today(2019-12-18). I defined my start date as start_date:dt.datetime(2019, 12, 18, 10, 00, 00) and schedule interval as schedule_interval= '*/5 * * * *' . When I start the airflow scheduler I don't see any of my tasks running. But when I modify the start_date as start_date:dt.datetime(2019, 12, 17, 10, 00, 00) i.e., Yesterdays date, the DAG runs continuously like for every 10 seconds but not 5 minutes. I think the solution to this

Helm stable/airflow - Custom values for Airflow deployment with Shared Persistent Volume using Helm chart failing

放肆的年华 提交于 2020-06-28 03:28:40
问题 Objective I want to deploy Airflow on Kubernetes where pods have access to the same DAGs, in a Shared Persistent Volume. According to the documentation (https://github.com/helm/charts/tree/master/stable/airflow#using-one-volume-for-both-logs-and-dags), it seems I have to set and pass these values to Helm: extraVolume , extraVolumeMount , persistence.enabled , logsPersistence.enabled , dags.path , logs.path . Problem Any custom values I pass when installing the official Helm chart results in

How to use xcom_push=True and auto_remove=True at the same time when using DockerOperator?

心不动则不痛 提交于 2020-06-27 16:49:31
问题 Problem When running DockerOperator with xcom_push=True , xcom_all=True and auto_remove=True , the task raises an error as if the container is deleted before reading its STDOUT . Example Consider the following DAG as an example: from datetime import datetime, timedelta from airflow import DAG from airflow.operators.docker_operator import DockerOperator from airflow.operators.python_operator import PythonOperator # Default (but overridable) arguments for Operators instantiations default_args =

Google Cloud Composer(Airflow) - dataflow job inside a DAG executes successfully, but the DAG fails

纵饮孤独 提交于 2020-06-27 07:29:26
问题 My DAG looks like this default_args = { 'start_date': airflow.utils.dates.days_ago(0), 'retries': 0, 'dataflow_default_options': { 'project': 'test', 'tempLocation': 'gs://test/dataflow/pipelines/temp/', 'stagingLocation': 'gs://test/dataflow/pipelines/staging/', 'autoscalingAlgorithm': 'BASIC', 'maxNumWorkers': '1', 'region': 'asia-east1' } } dag = DAG( dag_id='gcs_avro_to_bq_dag', default_args=default_args, description='ETL for loading data from GCS(present in the avro format) to BQ',

Airflow too many connections as a default

跟風遠走 提交于 2020-06-27 06:56:25
问题 I opened up airflow and checked the connections, and found out there are too many connections running behind it. Any ideas to how to kill those which I don't use, or I'd love to know the minimum conn_id to run it. Architecture LocalExecutor (Nothing like any other brokers) Postgres as the metadb However it lists 17 connections. Here are the connection lists. This is the airflow.cfg . [core] # Thee home folder for airflow, default is ~/airflow airflow_home = /usr/src/app # The folder where

Airflow - Skip future task instance without making changes to dag file

谁都会走 提交于 2020-06-26 06:17:04
问题 I have a DAG 'abc' scheduled to run every day at 7 AM CST and there is task 'xyz' in that DAG. For some reason, I do not want to run one of the tasks 'xyz' for tomorrow's instance. How can I skip that particular task instance? I do not want to make any changes to code as I do not have access to Prod code and the task is in Prod environment now. Is there any way to do that using command line ? Appreciate any help on this. 回答1: You can mark the unwanted tasks as succeeded using the run command.