airflow | 易学教程

Airflow安装与使用

阅读更多关于 Airflow安装与使用

# Airflow 1.10+安装本次安装Airflow版本为1.10+，其需要依赖Python和DB，本次选择的DB为Mysql。本次安装组件及版本如下：Airflow == 1.10.0Python == 3.6.5Mysql == 5.7# 整体流程1. 建表2. 安装3. 配置4. 运行5. 配置任务```启动scheduleairflow scheduler -D启动webserverairflow webserver -Dps -ef|grep -Ei '(airflow-webserver)'| grep master | awk '{print $2}'|xargs -i kill {}ps -ef | grep -Ei 'airflow' | grep -v 'grep' | awk '{print $2}' | xargs -i kill {}## 建库、建用户```库名为airflow'create database airflow;'建用户用户名为airflow，并且设置所有ip均可以访问。create user 'airflow'@'%' identified by 'airflow';create user 'airflow'@'localhost' identified by 'airflow'

airflow（二）集成EMR使用

阅读更多关于 airflow（二）集成EMR使用

1. 准备工作 1.1. 安装并初始化airflow，参考以下文档： https://www.cnblogs.com/zackstang/p/11082322.html 其中还要额外安装的是： sudo pip-3.6 install -i https://pypi.tuna.tsinghua.edu.cn/simple 'apache-airflow[celery]' sudo pip-3.6 install -i https://pypi.tuna.tsinghua.edu.cn/simple boto3 1.2. 配置好本地AWS Credentials，此credential需有启动EMR 的权限。 1.3. 置数据库为外部数据库：编辑 airflow.cfg 文件，修改数据库连接配置（需提前在数据库中创建好airflowdb 的数据库）： sql_alchemy_conn = mysql://user:password@database_location/airflowdb 使用下面的命令检查并初始化： airflow initdb 1.4. 配置executor 为 CeleryExecutor 编辑airflow.cfg 文件，修改executor配置： executor = CeleryExecutor 修改后可以保证相互无依赖的任务可以并行执行

Billing on bigquery

阅读更多关于 Billing on bigquery

问题 Hi i have installed airflow on docker. what i see now is callback with default doenst work. when job fails it doenst call the function mentioned. i have to add callback to each and every task and then it works. Do you recognise this issue? what is the solution? default_args['on_failure_callback'] = slack_failed_task_callback Further i noticed that bash environment variable which i have set are not inherited to python operator(python operator is derived from bash operator if i am not wrong).

Airflow reset environment variable while running bashoperator

阅读更多关于 Airflow reset environment variable while running bashoperator

问题 With one of my airflow task, I have an environment variable issue. [2019-08-19 04:51:04,603] {{bash_operator.py:127}} INFO - File "/home/ubuntu/.pyenv/versions/3.6.7/lib/python3.6/os.py", line 669, in __getitem__ [2019-08-19 04:51:04,603] {{bash_operator.py:127}} INFO - raise KeyError(key) from None [2019-08-19 04:51:04,603] {{bash_operator.py:127}} INFO - KeyError: 'HOME' [2019-08-19 04:51:04,639] {{bash_operator.py:131}} INFO - Command exited with return code 1 And my task is the following:

unable to specify master_type in MLEngineTrainingOperator

阅读更多关于 unable to specify master_type in MLEngineTrainingOperator

问题 I am using airflow to schedule a pipeline that will result in training a scikitlearn model with ai platform. I use this DAG to train it with models.DAG(JOB_NAME, schedule_interval=None, default_args=default_args) as dag: # Tasks definition training_op = MLEngineTrainingOperator( task_id='submit_job_for_training', project_id=PROJECT, job_id=job_id, package_uris=[os.path.join(TRAINER_BIN)], training_python_module=TRAINER_MODULE, runtime_version=RUNTIME_VERSION, region='europe-west1', training

Cannot run apache airflow after fresh install, python import error

阅读更多关于 Cannot run apache airflow after fresh install, python import error

问题 after a fresh install using pip install apache-airflow , any attempts to run airflow end with a python import error: Traceback (most recent call last): File "/Users/\*/env/bin/airflow", line 26, in <module> from airflow.bin.cli import CLIFactory File "/Users/\*/env/lib/python3.7/site-packages/airflow/bin/cli.py", line 70, in <module> from airflow.www.app import (cached_app, create_app) File "/Users/\*/env/lib/python3.7/site-packages/airflow/www/app.py", line 26, in <module> from flask_wtf

How to pull Spark jobs client logs submitted using Apache Livy batches POST method using AirFlow

阅读更多关于 How to pull Spark jobs client logs submitted using Apache Livy batches POST method using AirFlow

问题 I am working on submitting Spark job using Apache Livy batches POST method. This HTTP request is send using AirFlow. After submitting job, I am tracking status using batch Id. I want to show driver ( client logs) logs on Air Flow logs to avoid going to multiple places AirFLow and Apache Livy/Resource Manager. Is this possible to do using Apache Livy REST API? 回答1: Livy has an endpoint to get logs /sessions/{sessionId}/log & /batches/{batchId}/log . Documentation: https://livy.incubator.apache

Trigger Cloud Composer DAG with a Pub/Sub message

阅读更多关于 Trigger Cloud Composer DAG with a Pub/Sub message

问题 I am trying to create a Cloud Composer DAG to be triggered via a Pub/Sub message. There is the following example from Google which triggers a DAG every time a change occurs in a Cloud Storage bucket: https://cloud.google.com/composer/docs/how-to/using/triggering-with-gcf However, on the beginning they say you can trigger DAGs in response to events, such as a change in a Cloud Storage bucket or a message pushed to Cloud Pub/Sub . I have spent a lot of time try to figure out how that can be

Workflow scheduling on GCP Dataproc cluster

阅读更多关于 Workflow scheduling on GCP Dataproc cluster

问题 I have some complex Oozie workflows to migrate from on-prem Hadoop to GCP Dataproc. Workflows consist of shell-scripts, Python scripts, Spark-Scala jobs, Sqoop jobs etc. I have come across some potential solutions incorporating my workflow scheduling needs: Cloud Composer Dataproc Workflow Template with Cloud Scheduling Install Oozie on Dataproc auto-scaling cluster Please let me know which option would be most efficient in terms of performance, costing and migration complexities. 回答1: All 3

airflow调度使用心得

阅读更多关于 airflow调度使用心得

从第一次接触airflow到生产投产使用已经有近两个月的时间了，从部署到开发到运维调优，期间也遇到各种各样的问题，自己也从听说到熟悉。这篇博客主要从三个方面着手。一、安装部署airflow 调度工具airflow安装使用一、安装airflow 1.环境准备： 1.1.安装mysql数据库解压 mariadb包： tar -xzvf mariadb-10.2.14-linux-x86_64.tar.gz cd mariadb-10.2.14-linux-x86_64 配置mysql：根据实际需求修改my.cnf配置文件。启动mysql：初始化mysql数据库：scripts/mysql_install_db 启动mysql：nohup bin/mysqld --defaults-file=my.cnf & 重置mysql root密码： bin/mysqladmin -u root password “123456” bin/mysql -u root -p 输入密码即可登录 1.2.创建airflow数据库及用户建库： mysql> create database airflow; 建用户： mysql> create user ‘airflow’@’%’ identified by ‘airflow’; mysql> create user ‘airflow’@