airflow

Airflow安装与使用

岁酱吖の 提交于 2020-03-14 14:37:28
# Airflow 1.10+安装本次安装Airflow版本为1.10+,其需要依赖Python和DB,本次选择的DB为Mysql。本次安装组件及版本如下:Airflow == 1.10.0Python == 3.6.5Mysql == 5.7# 整体流程1. 建表2. 安装3. 配置4. 运行5. 配置任务```启动scheduleairflow scheduler -D启动webserverairflow webserver -Dps -ef|grep -Ei '(airflow-webserver)'| grep master | awk '{print $2}'|xargs -i kill {}ps -ef | grep -Ei 'airflow' | grep -v 'grep' | awk '{print $2}' | xargs -i kill {}## 建库、建用户```库名为airflow'create database airflow;'建用户用户名为airflow,并且设置所有ip均可以访问。create user 'airflow'@'%' identified by 'airflow';create user 'airflow'@'localhost' identified by 'airflow'

airflow(二)集成EMR使用

岁酱吖の 提交于 2020-03-12 23:44:33
1. 准备工作 1.1. 安装并初始化airflow,参考以下文档: https://www.cnblogs.com/zackstang/p/11082322.html 其中还要额外安装的是: sudo pip-3.6 install -i https://pypi.tuna.tsinghua.edu.cn/simple 'apache-airflow[celery]' sudo pip-3.6 install -i https://pypi.tuna.tsinghua.edu.cn/simple boto3 1.2. 配置好本地AWS Credentials,此credential需有启动EMR 的权限。 1.3. 置数据库为外部数据库: 编辑 airflow.cfg 文件,修改数据库连接配置(需提前在数据库中创建好airflowdb 的数据库): sql_alchemy_conn = mysql://user:password@database_location/airflowdb 使用下面的命令检查并初始化: airflow initdb 1.4. 配置executor 为 CeleryExecutor 编辑airflow.cfg 文件,修改executor配置: executor = CeleryExecutor 修改后可以保证相互无依赖的任务可以并行执行

Billing on bigquery

霸气de小男生 提交于 2020-03-05 03:24:11
问题 Hi i have installed airflow on docker. what i see now is callback with default doenst work. when job fails it doenst call the function mentioned. i have to add callback to each and every task and then it works. Do you recognise this issue? what is the solution? default_args['on_failure_callback'] = slack_failed_task_callback Further i noticed that bash environment variable which i have set are not inherited to python operator(python operator is derived from bash operator if i am not wrong).

Airflow reset environment variable while running bashoperator

Deadly 提交于 2020-03-04 23:07:26
问题 With one of my airflow task, I have an environment variable issue. [2019-08-19 04:51:04,603] {{bash_operator.py:127}} INFO - File "/home/ubuntu/.pyenv/versions/3.6.7/lib/python3.6/os.py", line 669, in __getitem__ [2019-08-19 04:51:04,603] {{bash_operator.py:127}} INFO - raise KeyError(key) from None [2019-08-19 04:51:04,603] {{bash_operator.py:127}} INFO - KeyError: 'HOME' [2019-08-19 04:51:04,639] {{bash_operator.py:131}} INFO - Command exited with return code 1 And my task is the following:

unable to specify master_type in MLEngineTrainingOperator

徘徊边缘 提交于 2020-03-02 09:37:36
问题 I am using airflow to schedule a pipeline that will result in training a scikitlearn model with ai platform. I use this DAG to train it with models.DAG(JOB_NAME, schedule_interval=None, default_args=default_args) as dag: # Tasks definition training_op = MLEngineTrainingOperator( task_id='submit_job_for_training', project_id=PROJECT, job_id=job_id, package_uris=[os.path.join(TRAINER_BIN)], training_python_module=TRAINER_MODULE, runtime_version=RUNTIME_VERSION, region='europe-west1', training

Cannot run apache airflow after fresh install, python import error

牧云@^-^@ 提交于 2020-02-27 14:23:08
问题 after a fresh install using pip install apache-airflow , any attempts to run airflow end with a python import error: Traceback (most recent call last): File "/Users/\*/env/bin/airflow", line 26, in <module> from airflow.bin.cli import CLIFactory File "/Users/\*/env/lib/python3.7/site-packages/airflow/bin/cli.py", line 70, in <module> from airflow.www.app import (cached_app, create_app) File "/Users/\*/env/lib/python3.7/site-packages/airflow/www/app.py", line 26, in <module> from flask_wtf

How to pull Spark jobs client logs submitted using Apache Livy batches POST method using AirFlow

家住魔仙堡 提交于 2020-02-25 05:38:05
问题 I am working on submitting Spark job using Apache Livy batches POST method. This HTTP request is send using AirFlow. After submitting job, I am tracking status using batch Id. I want to show driver ( client logs) logs on Air Flow logs to avoid going to multiple places AirFLow and Apache Livy/Resource Manager. Is this possible to do using Apache Livy REST API? 回答1: Livy has an endpoint to get logs /sessions/{sessionId}/log & /batches/{batchId}/log . Documentation: https://livy.incubator.apache

Trigger Cloud Composer DAG with a Pub/Sub message

走远了吗. 提交于 2020-02-25 04:13:14
问题 I am trying to create a Cloud Composer DAG to be triggered via a Pub/Sub message. There is the following example from Google which triggers a DAG every time a change occurs in a Cloud Storage bucket: https://cloud.google.com/composer/docs/how-to/using/triggering-with-gcf However, on the beginning they say you can trigger DAGs in response to events, such as a change in a Cloud Storage bucket or a message pushed to Cloud Pub/Sub . I have spent a lot of time try to figure out how that can be

Workflow scheduling on GCP Dataproc cluster

落爺英雄遲暮 提交于 2020-02-24 03:56:08
问题 I have some complex Oozie workflows to migrate from on-prem Hadoop to GCP Dataproc. Workflows consist of shell-scripts, Python scripts, Spark-Scala jobs, Sqoop jobs etc. I have come across some potential solutions incorporating my workflow scheduling needs: Cloud Composer Dataproc Workflow Template with Cloud Scheduling Install Oozie on Dataproc auto-scaling cluster Please let me know which option would be most efficient in terms of performance, costing and migration complexities. 回答1: All 3

airflow调度使用心得

ぐ巨炮叔叔 提交于 2020-02-23 19:12:35
从第一次接触airflow到生产投产使用已经有近两个月的时间了,从部署到开发到运维调优,期间也遇到各种各样的问题,自己也从听说到熟悉。这篇博客主要从三个方面着手。 一、安装部署airflow 调度工具airflow安装使用 一、安装airflow 1.环境准备: 1.1.安装mysql数据库 解压 mariadb包: tar -xzvf mariadb-10.2.14-linux-x86_64.tar.gz cd mariadb-10.2.14-linux-x86_64 配置mysql: 根据实际需求修改my.cnf配置文件。 启动mysql: 初始化mysql数据库:scripts/mysql_install_db 启动mysql:nohup bin/mysqld --defaults-file=my.cnf & 重置mysql root密码: bin/mysqladmin -u root password “123456” bin/mysql -u root -p 输入密码即可登录 1.2.创建airflow数据库及用户 建库: mysql> create database airflow; 建用户: mysql> create user ‘airflow’@’%’ identified by ‘airflow’; mysql> create user ‘airflow’@