airflow

airflow 待测试

拜拜、爱过 提交于 2019-12-01 20:28:28
#coding=utf-8 from datetime import datetime, timedelta from airflow import DAG from airflow.operators.python_operator import PythonOperator import airflow.utils # 定义默认参数 default_args = { 'owner': 'airflow', # 拥有者名称 'start_date': airflow.utils.dates.days_ago(1),# 第一次开始执行的时间,为格林威治时间,为了方便测试,一般设置为当前时间减去执行周期 'email': ['lshan523@163.com'], # 接收通知的email列表 'email_on_failure': True, # 是否在任务执行失败时接收邮件 'email_on_retry': True, # 是否在任务重试时接收邮件 'retries': 3, # 失败重试次数 'retry_delay': timedelta(seconds=5) # 失败重试间隔 } # 定义DAG dag = DAG( dag_id='hello_world_args', # dag_id default_args=default_args, # 指定默认参数 #

Airflow 1.10 - Scheduler Startup Fails

偶尔善良 提交于 2019-12-01 20:18:59
I've just painfully installed Airflow 1.10 thanks to my previous post here . We have a single ec2-instance running, our queue is AWS Elastic Cache Redis, and our meta database is AWS RDS for PostgreSQL. Airflow works with this setup just fine when we are on Airflow version 1.9. But we are encountering an issue on Airflow version 1.10 when we go to start up the scheduler. [2018-08-15 16:29:14,015] {jobs.py:385} INFO - Started process (PID=15778) to work on /home/ec2-user/airflow/dags/myDag.py [2018-08-15 16:29:14,055] {jobs.py:1782} INFO - Processing file /home/ec2-user/airflow/dags/myDag.py

Scheduling dag runs in Airflow

↘锁芯ラ 提交于 2019-12-01 18:41:22
Got a general query on Airflow Is it possible to have a dag file scheduled based on another dag file's schedule. For example, if I have 2 dags namely dag1 and dag2. I am trying to see if I can have dag2 run each time dag1 is successful else dag2 does not run. Is this possible in Airflow. You will want to add a TriggerDagRunOperator the end of dag1 and set the schedule of dag2 to None . In addition, if you want to handle multiple cases for the output of dag1 , you can add in a BranchPythonOperator to create multiple paths based on its output. For example, you could set it to either execute the

Google Cloud Composer and Google Cloud SQL

和自甴很熟 提交于 2019-12-01 18:06:11
What ways do we have available to connect to a Google Cloud SQL (MySQL) instance from the newly introduced Google Cloud Composer? The intention is to get data from a Cloud SQL instance into BigQuery (perhaps with an intermediary step through Cloud Storage). Can the Cloud SQL proxy be exposed in some way on pods part the Kubernetes cluster hosting Composer? If not can the Cloud SQL Proxy be brought in by using the Kubernetes Service Broker? -> https://cloud.google.com/kubernetes-engine/docs/concepts/add-on/service-broker Should Airflow be used to schedule and call GCP API commands like 1)

Airflow: Tasks queued but not running

泄露秘密 提交于 2019-12-01 17:25:18
I am new to airflow and trying to setup airflow to run ETL pipelines. I was able to install airflow postgres celery rabbitmq I am able to test run the turtorial dag. When i try to schedule the jobs, scheduler is able to pick it up and queue the jobs which i could see on the UI but tasks are not running. Could somebody help me fix ths issue? I believe i am missing most basic airflow concept here. below is the airflow.cfg Here is my config file: [core] airflow_home = /root/airflow dags_folder = /root/airflow/dags base_log_folder = /root/airflow/logs executor = CeleryExecutor sql_alchemy_conn =

Airflow celery worker will be blocked if sensor number large than concurrency?

可紊 提交于 2019-12-01 16:25:05
Let's say, I set celery concurrency to n , but I have m ( m > n ) ExternalTaskSensor in a dag, it will check another dag named do_sth , these ExternalTaskSensor will consume all celery worker, so that no one will work in fact. But I can't set concurreny too high(like 2* m ), because dag do_sth may start too many process which will lead to out of memory. I am confused what number I should set to celery concurrency? In ETL best practices with Airflow's Gotchas section the author addresses this general problem. One of the suggestions is to setup a pool for your sensor tasks so that your other

Google Cloud Composer and Google Cloud SQL

无人久伴 提交于 2019-12-01 16:23:50
问题 What ways do we have available to connect to a Google Cloud SQL (MySQL) instance from the newly introduced Google Cloud Composer? The intention is to get data from a Cloud SQL instance into BigQuery (perhaps with an intermediary step through Cloud Storage). Can the Cloud SQL proxy be exposed in some way on pods part the Kubernetes cluster hosting Composer? If not can the Cloud SQL Proxy be brought in by using the Kubernetes Service Broker? -> https://cloud.google.com/kubernetes-engine/docs

Airflow External sensor gets stuck at poking

三世轮回 提交于 2019-12-01 16:20:48
I want one dag starts after completion of another dag. one solution is using external sensor function, below you can find my solution. the problem I encounter is that the dependent dag is stuck at poking, I checked this answer and made sure that both of the dags runs on the same schedule, my simplified code is as follows: any help would be appreciated. leader dag: from airflow import DAG from airflow.operators.bash_operator import BashOperator from datetime import datetime, timedelta default_args = { 'owner': 'airflow', 'depends_on_past': False, 'start_date': datetime(2015, 6, 1), 'retries': 1

Airflow celery worker will be blocked if sensor number large than concurrency?

半世苍凉 提交于 2019-12-01 16:09:19
问题 Let's say, I set celery concurrency to n , but I have m ( m > n ) ExternalTaskSensor in a dag, it will check another dag named do_sth , these ExternalTaskSensor will consume all celery worker, so that no one will work in fact. But I can't set concurreny too high(like 2* m ), because dag do_sth may start too many process which will lead to out of memory. I am confused what number I should set to celery concurrency? 回答1: In ETL best practices with Airflow's Gotchas section the author addresses

Airflow authentication setups fails with “AttributeError: can't set attribute”

你说的曾经没有我的故事 提交于 2019-12-01 15:55:00
The Airflow version 1.8 password authentication setup as described in the docs fails at the step user.password = 'set_the_password' with error AttributeError: can't set attribute It's better to simply use the new method of PasswordUser _set_password : # Instead of user.password = 'password' user._set_password = 'password' This is due to an update of SqlAlchemy to a version >= 1.2 that introduced a backwards incompatible change. You can fix this by explicitly installing a SqlAlchemy version <1.2. pip install 'sqlalchemy<1.2' Or in a requirement.txt sqlalchemy<1.2 Fixed with pip install