airflow | 易学教程

Airflow 调度框架安装经验分享单节点

阅读更多关于 Airflow 调度框架安装经验分享单节点

crontab定时任务不利于平时的监控,决定使用一种新的调度框架 1.安装依赖 # 避免连接密码以明文形式存储 pip3 install cryptography pip3 install paramiko # AttributeError: module 'enum' has no attribute 'IntFlag' pip3 uninstall enum34 pip3 install celery pip3 install redis pip3 install dask yum install mysql - devel pip3 install mysqlclient pip3 install apache - airflow # 避免产生大量日志 / usr / local / lib / python3 . 7 / site - packages / airflow vim settings . py # LOGGING_LEVEL = logging.INFO LOGGING_LEVEL = logging . WARN 2.配置环境变量 # vim /etc/profile # 指定airflow工作目录,airflow的工作目录默认在当前用户目录下 export AIRFLOW_HOME = / usr / local / airflow # source

Export all airflow connections to new environment

阅读更多关于 Export all airflow connections to new environment

问题 I'm trying to migrate all the existing airflow connections to a new airflow. I was looking at the cli options airflow connections --help , it gives an option to list but doesn't give an option to export/import to/from json format. Is there a way via cli/airflow ui to migrate connections across multiple airflows? 回答1: You can either connect directly to the Airflow meta db and dump those connections, then load them in a separate database. However, if you want to automate something like this,

Use airflow hive operator and output to a text file

阅读更多关于 Use airflow hive operator and output to a text file

问题 Hi I want to execute hive query using airflow hive operator and output the result to a file. I don't want to use INSERT OVERWRITE here. hive_ex = HiveOperator( task_id='hive-ex', hql='/sql/hive-ex.sql', hiveconfs={ 'DAY': '{{ ds }}', 'YESTERDAY': '{{ yesterday_ds }}', 'OUTPUT': '{{ file_path }}'+'csv', }, dag=dag ) What is the best way to do this? I know how to do this using bash operator,but want to know if we can use hive operator hive_ex = BashOperator( task_id='hive-ex', bash_command=

Use airflow hive operator and output to a text file

阅读更多关于 Use airflow hive operator and output to a text file

Airflow task after BranchPythonOperator does not fail and succeed correctly

阅读更多关于 Airflow task after BranchPythonOperator does not fail and succeed correctly

问题 In my DAG, I have some tasks that should only be run on Saturdays. Therefore I used a BranchPythonOperator to branch between the tasks for Saturdays and a DummyTask. After that, I join both branches and want to run other tasks. The workflow looks like this: Here I set the trigger rule for dummy3 to 'one_success' and everything works fine. The problem I encountered is when something upstream of the BranchPythonOperator fails: The BranchPythonOperator and the branches correctly have the state

How to delete XCOM objects once the DAG finishes its run in Airflow

阅读更多关于 How to delete XCOM objects once the DAG finishes its run in Airflow

问题 I have a huge json file in the XCOM which later I do not need once the dag execution is finished, but I still see the Xcom Object in the UI with all the data, Is there any way to delete the XCOM programmatically once the DAG run is finished. Thank you 回答1: You have to add a task depends on you metadatadb (sqllite, PostgreSql, MySql..) that delete XCOM once the DAG run is finished. delete_xcom_task = PostgresOperator( task_id='delete-xcom-task', postgres_conn_id='airflow_db', sql="delete from

For Apache Airflow, How can I pass the parameters when manually trigger DAG via CLI?

阅读更多关于 For Apache Airflow, How can I pass the parameters when manually trigger DAG via CLI?

问题 I use Airflow to manage ETL tasks execution and schedule. A DAG has been created and it works fine. But is it possible to pass parameters when manually trigger the dag via cli. For example: My DAG runs every day at 01:30, and processes data for yesterday(time range from 01:30 yesterday to 01:30 today). There might be some issues with the data source. I need to re-process those data (manually specify the time range). So can I create such an airflow DAG, when it's scheduled, that the default

Export environment variables at runtime with airflow

阅读更多关于 Export environment variables at runtime with airflow

问题 I am currently converting workflows that were implemented in bash scripts before to Airflow DAGs. In the bash scripts, I was just exporting the variables at run time with export HADOOP_CONF_DIR="/etc/hadoop/conf" Now I'd like to do the same in Airflow, but haven't found a solution for this yet. The one workaround I found was setting the variables with os.environ[VAR_NAME]='some_text' outside of any method or operator, but that means they get exported the moment the script gets loaded, not at

Run .EXE and Powershell tasks with Airflow

阅读更多关于 Run .EXE and Powershell tasks with Airflow

问题 our systems are basically just Windows Servers running C# and Powershell applications in conjunction with MS SQL Server. We have a in-house WorkflowManagement solution that is able to run tasks that execute EXE/BAT/PS1 and even call DLL-Functions. Now I am evaluating if Apache Airflow is a better solution for us. My naive plan so far is to run airflow scheduler on a Linux-machine and then let the consumers run on Windows machines. But how would I setup the consumer to run a .exe task for

Airflow psycopg2.OperationalError: FATAL: sorry, too many clients already

阅读更多关于 Airflow psycopg2.OperationalError: FATAL: sorry, too many clients already

问题 I have a four node clustered Airflow environment that's been working fine for me for a few months now. ec2-instances Server 1: Webserver, Scheduler, Redis Queue, PostgreSQL Database Server 2: Webserver Server 3: Worker Server 4: Worker Recently I've been working on a more complex DAG that has a few dozen tasks in it compared to my relatively small ones I was working on beforehand. I'm not sure if that's why I'm just now seeing this error pop up or what but I'll sporadically get this error: On