airflow

pyarrow.lib.ArrowTypeError: an integer is required (got type str)

依然范特西╮ 提交于 2020-01-24 16:50:35
问题 I want to ingest the new rows from my sql server table. The way I found to get the differential is to use the script below. For MySql tables it works perfectly. When I inserted the pymssql library to connect to this new bank and apply differential file ingestion, I run into the error below: I ask for help understanding why for tables that are on Sql Server I can't apply the script! import os import pandas as pd import numpy as np import mysql.connector as sql from datetime import datetime,

pyarrow.lib.ArrowTypeError: an integer is required (got type str)

≡放荡痞女 提交于 2020-01-24 16:48:05
问题 I want to ingest the new rows from my sql server table. The way I found to get the differential is to use the script below. For MySql tables it works perfectly. When I inserted the pymssql library to connect to this new bank and apply differential file ingestion, I run into the error below: I ask for help understanding why for tables that are on Sql Server I can't apply the script! import os import pandas as pd import numpy as np import mysql.connector as sql from datetime import datetime,

Unable to setup a DB2 / DashDB JDBC Connection in Apache Airflow

♀尐吖头ヾ 提交于 2020-01-24 13:33:12
问题 I'm trying to create a DB2 / DashDB connection using the Airflow UI. I have added the db2jcc4.jar driver and provided the path as well as the class name com.ibm.db2.jcc.DB2Driver.class I tried to run a simple query (in the ad hoc UI) and always get the same error java.lang.RuntimeException: Class com.ibm.db2.jcc.DB2Driver.class not found Did anybody need to setup a DB2 / DashDB connection in Apache Airflow before? Found nothing on the web about that. Thanks 回答1: May be stupid thing to check

Airflow skip current task

一个人想着一个人 提交于 2020-01-24 12:15:07
问题 Is there a way for Airflow to skip current task from within the (Python)Operator? For example: def execute(): if condition: skip_current_task() task = PythonOperator(task_id='task', python_callable=execute, dag=some_dag) Skipping downstream tasks doesn't suit me (a solution proposed in this answer: How to skip tasks on Airflow?), as well as branching. Is there a way for a task to mark its state as skipped from within the Operator? 回答1: Figured it out! Skipping task is as easy as: def execute(

Airflow skip current task

你离开我真会死。 提交于 2020-01-24 12:15:06
问题 Is there a way for Airflow to skip current task from within the (Python)Operator? For example: def execute(): if condition: skip_current_task() task = PythonOperator(task_id='task', python_callable=execute, dag=some_dag) Skipping downstream tasks doesn't suit me (a solution proposed in this answer: How to skip tasks on Airflow?), as well as branching. Is there a way for a task to mark its state as skipped from within the Operator? 回答1: Figured it out! Skipping task is as easy as: def execute(

Airflow使用入门指南

五迷三道 提交于 2020-01-24 06:21:36
Airflow能做什么 关注公众号, 查看更多 http://mp.weixin.qq.com/s/xPjXMc_6ssHt16J07BC7jA Airflow 是一个工作流分配管理系统,通过有向非循环图的方式管理任务流程,设置任务依赖关系和时间调度。 Airflow独立于我们要运行的任务,只需要把任务的名字和运行方式提供给Airflow作为一个task就可以。 安装和使用 最简单安装 在Linux终端运行如下命令 (需要已安装好 python2.x 和 pip ): pip install airflow pip install "airflow[crypto, password]" 1 2 安装成功之后,执行下面三步,就可以使用了。默认是使用的 SequentialExecutor , 只能顺次执行任务。 初始化数据库 airflow initdb [必须的步骤] 启动web服务器 airflow webserver -p 8080 [方便可视化管理dag] 启动任务 airflow scheduler [scheduler启动后,DAG目录下的dags就会根据设定的时间定时启动] 此外我们还可以直接测试单个DAG,如测试文章末尾的DAG airflow test ct1 print_date 2016-05-14 最新版本的Airflow可从 https://github

Want to create airflow tasks that are downstream of the current task

Deadly 提交于 2020-01-23 17:53:05
问题 I'm mostly brand new to airflow. I have a two step process: Get all files that match a criteria Uncompress the files The files are half a gig compressed, and 2 - 3 gig when uncompressed. I can easily have 20+ files to process at a time, which means uncompressing all of them can run longer than just about any reasonable timeout I could use XCom to get the results of step 1, but what I'd like to do is something like this: def processFiles (reqDir, gvcfDir, matchSuffix): theFiles = getFiles

How to skip tasks on Airflow?

浪子不回头ぞ 提交于 2020-01-23 07:51:19
问题 I'm trying to understand whether Airflow supports skipping tasks in a DAG for ad-hoc executions? Lets say my DAG graph look like this: task1 > task2 > task3 > task4 And I would like to start my DAG manually from task3, what is the best way of doing that? I've read about ShortCircuitOperator , but I'm looking for more ad-hoc solution which can apply once the execution is triggered. Thanks! 回答1: You can incorporate the SkipMixin that the ShortCircuitOperator uses under the hood to skip

airflow systemd fails due to gunicorn

余生长醉 提交于 2020-01-23 01:49:05
问题 I am unable to start the airflow webserver using systemd even though it starts and functions properly outside of systemd like so: export AIRFLOW_HOME=/path/to/my/airflow/home ; airflow webserver -p 8080 The systemd log leads me to believe that the issue comes from gunicorn, even though gunicorn starts without issue when I run the above command (i.e. it's only an issue in systemd). I have configured the following systemd files according to the airflow docs (running Ubuntu 16). /etc/default

Airflow - Disable heartbeat logs

随声附和 提交于 2020-01-22 21:29:26
问题 My logs are getting completely flooded with useless messages for every heartbeat. [2019-11-27 21:32:47,890] {{logging_mixin.py:112}} INFO - [2019-11-27 21:32:47,889] {local_task_job.py:124} WARNING - Time since last heartbeat(0.02 s) < heartrate(5.0 s), sleeping for 4.983326 s [2019-11-27 21:32:52,921] {{logging_mixin.py:112}} INFO - [2019-11-27 21:32:52,921] {local_task_job.py:124} WARNING - Time since last heartbeat(0.02 s) < heartrate(5.0 s), sleeping for 4.984673 s [2019-11-27 21:32:57