airflow

setting up s3 for logs in airflow

生来就可爱ヽ(ⅴ<●) 提交于 2020-01-09 02:19:24
问题 I am using docker-compose to set up a scalable airflow cluster. I based my approach off of this Dockerfile https://hub.docker.com/r/puckel/docker-airflow/ My problem is getting the logs set up to write/read from s3. When a dag has completed I get an error like this *** Log file isn't local. *** Fetching here: http://ea43d4d49f35:8793/log/xxxxxxx/2017-06-26T11:00:00 *** Failed to fetch log file from worker. *** Reading remote logs... Could not read logs from s3://buckets/xxxxxxx/airflow/logs

Airflow: how to delete a DAG?

核能气质少年 提交于 2020-01-08 19:40:32
问题 I have started the Airflow webserver and scheduled some dags. I can see the dags on web GUI. How can I delete a particular DAG from being run and shown in web GUI? Is there an Airflow CLI command to do that? I looked around but could not find an answer for a simple way of deleting a DAG once it has been loaded and scheduled. 回答1: Edit 8/27/18 - Airflow 1.10 is now released on PyPI! https://pypi.org/project/apache-airflow/1.10.0/ How to delete a DAG completely We have this feature now in

airflow部署和使用示例

亡梦爱人 提交于 2020-01-08 12:11:27
进大厂,身价翻倍的法宝来了! 主讲内容:docker/kubernetes 云原生技术,大数据架构,分布式微服务,自动化测试、运维。 视频地址: ke.qq.com/course/419718 参考:https://www.jianshu.com/p/089c56b4ec14 airflow介绍 https://lxwei.github.io/posts/airflow%E4%BB%8B%E7%BB%8D.html python 教程:https://airflow.apache.org/tutorial.html airflow调度 airflow的scheduler加载到dags后,会直接创建一批dags实例. 这些执行实例的Execution Date为start_time到当期时间之间的符合start_time+n*schedule_interval的时刻点(包含start_time), 有多少个符合时刻的时刻点就会保障数据库中有多少个dags实例在数据库中. (可以通过设置catchup=False将现在时刻之前的执行时间不再调度,否则注意这里是保证有,如果已经存在了就不再创建) 下一次scheduler 再次加载dags文件, 重新计算代码里面写的start_time,然后重新生成需要执行的实例, 发现部分之前时间的dags实例在数据库中已经有了,就不会再创建了

Template_searchpath gives TemplateNotFound error in Airflow and cannot find the SQL script

瘦欲@ 提交于 2020-01-06 15:27:06
问题 I have a DAG described like this : tmpl_search_path = '/home/airflow/gcs/sql_requests/' with DAG(dag_id='pipeline', default_args=default_args, template_searchpath = [tmpl_search_path]) as dag: create_table = bigquery_operator.BigQueryOperator( task_id = 'create_table', sql = 'create_table.sql', use_legacy_sql = False, destination_dataset_table = some_table) ) The task create_table calls a SQL script create_table.sql . This SQL script is not in the same folder as the DAG folder : it is in a

Airflow on Docker - Path issue

≡放荡痞女 提交于 2020-01-06 08:10:45
问题 Working with airflow I try simple DAG work. I wrote custom operators and other files that I want to import into the main file where the DAG logic is. Here the folder's structure : ├── airflow.cfg ├── dags │ ├── __init__.py │ ├── dag.py │ └── sql_statements.sql ├── docker-compose.yaml ├── environment.yml └── plugins ├── __init__.py └── operators ├── __init__.py ├── facts_calculator.py ├── has_rows.py └── s3_to_redshift.py I setup the volume right in the compose file since I can see them when I

How can one set a variable for use only during a certain dag_run

你离开我真会死。 提交于 2020-01-06 07:32:38
问题 How do I set a variable for use during a particular dag_run. I'm aware of setting values in xcom, but not all the operators that I use has xcom support. I also would not like to store the value into the Variables datastore, in case another dag run begins while the current one is running, that need to store different values. 回答1: The question is not clear, but from whatever I can infer, I'll try to clear your doubts not all the operators that I use has xcom support Apparently you've mistaken

How can one set a variable for use only during a certain dag_run

陌路散爱 提交于 2020-01-06 07:32:30
问题 How do I set a variable for use during a particular dag_run. I'm aware of setting values in xcom, but not all the operators that I use has xcom support. I also would not like to store the value into the Variables datastore, in case another dag run begins while the current one is running, that need to store different values. 回答1: The question is not clear, but from whatever I can infer, I'll try to clear your doubts not all the operators that I use has xcom support Apparently you've mistaken

Setting up two ways SSL on Apache Airflow

三世轮回 提交于 2020-01-06 07:18:27
问题 I was able to achieve one way SSL but i'm getting stuck at two way SSL in airflow. Our requirement: App engine communicates with airflow to schedule jobs and we are trying to secure these routes so that only the two of them can communicate securely thus block any one else from accessing these resources. Is this possible via SSH? If so how do you achieve this? If this is not possible via SSH, what is a better way of achieving the same? Below is my airflow config file: [core] # The home folder

Dynamically Creating DAG based on Row available on DB Connection

六眼飞鱼酱① 提交于 2020-01-06 05:38:31
问题 I want to create a dynamically created DAG from database table query. When I'm trying to create a dynamically creating DAG from both of range of exact number or based on available object in airflow settings it's succeeded. However when I'm trying to use a PostgresHook and create a DAG for each of row of my table, I can see a new DAG generated whenever I add a new row in my table. However it turned out that I can't click the newly created DAG on my airflow web server ui. For more context I'm

airflow error:AttributeError: module 'airflow.utils.log' has no attribute 'file_processor_handler'

醉酒当歌 提交于 2020-01-06 05:30:10
问题 My local airflow instant was up and running, but now when I run airflow webserver or any other airflow command I got the below error (with some traceback): Unable to load the config, contains a configuration error. Traceback (most recent call last): File "/anaconda3/lib/python3.6/logging/config.py", line 382, in resolve found = getattr(found, frag) AttributeError: module 'airflow.utils.log' has no attribute 'file_processor_handler' During handling of the above exception, another exception