apache-airflow

How do I setup an Airflow of 2 servers?

不想你离开。 提交于 2020-01-24 18:33:12
问题 Trying to split out Airflow processes onto 2 servers. Server A, which has been already running in standalone mode with everything on it, has the DAGs and I'd like to set it as the worker in the new setup with an additional server. Server B is the new server which would host the metadata database on MySQL. Can I have Server A run LocalExecutor, or would I have to use CeleryExecutor? Would airflow scheduler has to run on the server that has the DAGs right? Or does it have to run on every server

How do I setup an Airflow of 2 servers?

偶尔善良 提交于 2020-01-24 18:32:21
问题 Trying to split out Airflow processes onto 2 servers. Server A, which has been already running in standalone mode with everything on it, has the DAGs and I'd like to set it as the worker in the new setup with an additional server. Server B is the new server which would host the metadata database on MySQL. Can I have Server A run LocalExecutor, or would I have to use CeleryExecutor? Would airflow scheduler has to run on the server that has the DAGs right? Or does it have to run on every server

How do I setup an Airflow of 2 servers?

烈酒焚心 提交于 2020-01-24 18:32:01
问题 Trying to split out Airflow processes onto 2 servers. Server A, which has been already running in standalone mode with everything on it, has the DAGs and I'd like to set it as the worker in the new setup with an additional server. Server B is the new server which would host the metadata database on MySQL. Can I have Server A run LocalExecutor, or would I have to use CeleryExecutor? Would airflow scheduler has to run on the server that has the DAGs right? Or does it have to run on every server

Want to create airflow tasks that are downstream of the current task

Deadly 提交于 2020-01-23 17:53:05
问题 I'm mostly brand new to airflow. I have a two step process: Get all files that match a criteria Uncompress the files The files are half a gig compressed, and 2 - 3 gig when uncompressed. I can easily have 20+ files to process at a time, which means uncompressing all of them can run longer than just about any reasonable timeout I could use XCom to get the results of step 1, but what I'd like to do is something like this: def processFiles (reqDir, gvcfDir, matchSuffix): theFiles = getFiles

How to delete XCOM objects once the DAG finishes its run in Airflow

醉酒当歌 提交于 2020-01-14 08:11:11
问题 I have a huge json file in the XCOM which later I do not need once the dag execution is finished, but I still see the Xcom Object in the UI with all the data, Is there any way to delete the XCOM programmatically once the DAG run is finished. Thank you 回答1: You have to add a task depends on you metadatadb (sqllite, PostgreSql, MySql..) that delete XCOM once the DAG run is finished. delete_xcom_task = PostgresOperator( task_id='delete-xcom-task', postgres_conn_id='airflow_db', sql="delete from

Run .EXE and Powershell tasks with Airflow

不打扰是莪最后的温柔 提交于 2020-01-13 06:11:07
问题 our systems are basically just Windows Servers running C# and Powershell applications in conjunction with MS SQL Server. We have a in-house WorkflowManagement solution that is able to run tasks that execute EXE/BAT/PS1 and even call DLL-Functions. Now I am evaluating if Apache Airflow is a better solution for us. My naive plan so far is to run airflow scheduler on a Linux-machine and then let the consumers run on Windows machines. But how would I setup the consumer to run a .exe task for

Airflow : Passing a dynamic value to Sub DAG operator

拟墨画扇 提交于 2020-01-12 08:11:35
问题 I am new to Airflow. I have come across a scenario, where Parent DAG need to pass some dynamic number (let's say n ) to Sub DAG. Where as SubDAG will use this number to dynamically create n parallel tasks. Airflow documentation doesn't cover a way to achieve this. So I have explore couple of ways : Option - 1(Using xcom Pull) I have tried to pass as a xcom value, but for some reason SubDAG is not resolving to the passed value. Parent Dag File def load_dag(**kwargs): number_of_runs = json

Airflow : Passing a dynamic value to Sub DAG operator

你说的曾经没有我的故事 提交于 2020-01-12 08:08:07
问题 I am new to Airflow. I have come across a scenario, where Parent DAG need to pass some dynamic number (let's say n ) to Sub DAG. Where as SubDAG will use this number to dynamically create n parallel tasks. Airflow documentation doesn't cover a way to achieve this. So I have explore couple of ways : Option - 1(Using xcom Pull) I have tried to pass as a xcom value, but for some reason SubDAG is not resolving to the passed value. Parent Dag File def load_dag(**kwargs): number_of_runs = json

Apache Airflow DAG cannot import local module

浪子不回头ぞ 提交于 2020-01-11 04:27:25
问题 I do not seem to understand how to import modules into an apache airflow DAG definition file. I would want to do this to be able to create a library which makes declaring tasks with similar settings less verbose, for instance. Here is the simplest example I can think of that replicates the issue: I modified the airflow tutorial (https://airflow.apache.org/tutorial.html#recap) to simply import a module and run a definition from that module. Like so: Directory structure: - dags/ -- __init__.py

Copy files from one Google Cloud Storage Bucket to other using Apache Airflow

狂风中的少年 提交于 2020-01-03 02:30:09
问题 Problem : I want to copy files from a folder in Google Cloud Storage Bucket (e.g Folder1 in Bucket1) to another Bucket (e.g Bucket2). I can't find any Airflow Operator for Google Cloud Storage to copy files. 回答1: I know this is an old question but I found myself dealing with this task too. Since I'm using the Google Cloud-Composer, GoogleCloudStorageToGoogleCloudStorageOperator was not available in the current version. I managed to solve this issue by using a simple BashOperator from airflow