airflow

Proper way to create dynamic workflows in Airflow

情到浓时终转凉″ 提交于 2019-12-17 17:24:18
问题 Problem Is there any way in Airflow to create a workflow such that the number of tasks B.* is unknown until completion of Task A? I have looked at subdags but it looks like it can only work with a static set of tasks that have to be determined at Dag creation. Would dag triggers work? And if so could you please provide an example. I have an issue where it is impossible to know the number of task B's that will be needed to calculate Task C until Task A has been completed. Each Task B.* will

Proper way to create dynamic workflows in Airflow

試著忘記壹切 提交于 2019-12-17 17:22:30
问题 Problem Is there any way in Airflow to create a workflow such that the number of tasks B.* is unknown until completion of Task A? I have looked at subdags but it looks like it can only work with a static set of tasks that have to be determined at Dag creation. Would dag triggers work? And if so could you please provide an example. I have an issue where it is impossible to know the number of task B's that will be needed to calculate Task C until Task A has been completed. Each Task B.* will

How to define a DAG that scheduler a monthly job together with a daily job?

好久不见. 提交于 2019-12-17 17:15:59
问题 I have to update a table Foo monthly and another table Bar daily and join these two tables daily and insert the result into a third table Bazz Is it possible to configure that Foo is updated on certain day (say 5th), while Bar is updated daily and they are in the same DAG? 回答1: This behaviour can be achieved within single DAG using either of following alternatives ShortCircuitOperator AirflowSkipException (better in my opinion) Basically, your DAG would still run each day ( schedule_interval=

How to create a conditional task in Airflow

我们两清 提交于 2019-12-17 08:30:14
问题 I would like to create a conditional task in Airflow as described in the schema below. The expected scenario is the following: Task 1 executes If Task 1 succeed, then execute Task 2a Else If Task 1 fails, then execute Task 2b Finally execute Task 3 All tasks above are SSHExecuteOperator. I'm guessing I should be using the ShortCircuitOperator and / or XCom to manage the condition but I am not clear on how to implement that. Could you please describe the solution? 回答1: You have to use airflow

How to create a conditional task in Airflow

别来无恙 提交于 2019-12-17 08:30:02
问题 I would like to create a conditional task in Airflow as described in the schema below. The expected scenario is the following: Task 1 executes If Task 1 succeed, then execute Task 2a Else If Task 1 fails, then execute Task 2b Finally execute Task 3 All tasks above are SSHExecuteOperator. I'm guessing I should be using the ShortCircuitOperator and / or XCom to manage the condition but I am not clear on how to implement that. Could you please describe the solution? 回答1: You have to use airflow

How does Airflow's BranchPythonOperator work?

倾然丶 夕夏残阳落幕 提交于 2019-12-14 04:17:17
问题 I'm struggling to understand how BranchPythonOperator in Airflow works. I know it's primarily used for branching, but am confused by the documentation as to what to pass into a task and what I need to pass/expect from the task upstream. Given the simple example in the documentation on this page what would the source code look like for the upstream task called run_this_first and the 2 downstream ones that are branched? How exactly does Airflow know to run branch_a instead of branch_b ? Where

Scheduling spark jobs on a timely basis

爷,独闯天下 提交于 2019-12-14 03:43:05
问题 Which is the recommended tool for scheduling Spark Jobs on a daily/weekly basis. 1) Oozie 2) Luigi 3) Azkaban 4) Chronos 5) Airflow Thanks in advance. 回答1: Updating my previous answer from here: Suggestion for scheduling tool(s) for building hadoop based data pipelines Airflow: Try this first. Decent UI, Python-ish job definition, semi-accessible for non-programmers, dependency declaration syntax is weird. Airflow has built in support for the fact that jobs scheduled jobs often need to be

Airflow Impersonation with 'run_as_user' Not Working

自古美人都是妖i 提交于 2019-12-13 18:22:04
问题 I am trying to get impersonation working without success. I am following the instructions here - https://airflow.apache.org/security.html#impersonation I launched airflow webserver, scheduler, and worker as sudo running under the 'airflow' user. This user is setup in the sudoers file to allow no password logins. I created a BashOperator and a PythonOperator with the run_as_user parameter set to an existing user named 'linus' on the server. When I am logged in as 'airflow', I am able to switch

Composer on Google Platform not available for Python 3

这一生的挚爱 提交于 2019-12-13 18:13:22
问题 According to the documentation here: https://cloud.google.com/composer/docs/release-notes Composer (GCP's Airflow) is supposed to be available for Python 3 in the console. However, I am seeing no options for Python 3 in the console. 回答1: Python3 support is a beta feature in Composer, this doc describes how to enable and use beta feature in Cloud Composer. 回答2: I was confronted with the same problem and I was able to solve it. Since there is a checkbox of Enable Beta Features on the upper

How to enable SSL on Airflow Webserver?

柔情痞子 提交于 2019-12-13 16:17:43
问题 I've been trying to enable HTTPS via SSL on my Apache Airflow frontend but the documentation is quite sparse and there aren't that many good examples on this online. My instance of Airflow is currently running on a Red Hat Linux VM. I've tried generating a key/certificate, and pointing the configuration file to the respective paths, but it does not seem to work. From the Airflow documentation, it seems like we are supposed to simply generate a path to the cert and key & add a path to the SSL