airflow

Running airflow tasks/dags in parallel

前提是你 提交于 2020-02-03 04:04:57
问题 I'm using airflow to orchestrate some python scripts. I have a "main" dag from which several subdags are run. My main dag is supposed to run according to the following overview: I've managed to get to this structure in my main dag by using the following lines: etl_internal_sub_dag1 >> etl_internal_sub_dag2 >> etl_internal_sub_dag3 etl_internal_sub_dag3 >> etl_adzuna_sub_dag etl_internal_sub_dag3 >> etl_adwords_sub_dag etl_internal_sub_dag3 >> etl_facebook_sub_dag etl_internal_sub_dag3 >> etl

Running airflow tasks/dags in parallel

白昼怎懂夜的黑 提交于 2020-02-03 04:04:24
问题 I'm using airflow to orchestrate some python scripts. I have a "main" dag from which several subdags are run. My main dag is supposed to run according to the following overview: I've managed to get to this structure in my main dag by using the following lines: etl_internal_sub_dag1 >> etl_internal_sub_dag2 >> etl_internal_sub_dag3 etl_internal_sub_dag3 >> etl_adzuna_sub_dag etl_internal_sub_dag3 >> etl_adwords_sub_dag etl_internal_sub_dag3 >> etl_facebook_sub_dag etl_internal_sub_dag3 >> etl

Test Dag run for Airflow 1.9 in unittest

青春壹個敷衍的年華 提交于 2020-02-02 12:14:46
问题 I had implemented test case for running an individual dag but it does not seem to work in 1.9 and may be due to stricter pool which got introduced in airflow 1.8 . I am trying to run below test case: from airflow import DAG from airflow.operators.dummy_operator import DummyOperator from airflow.operators.python_operator import PythonOperator class DAGTest(unittest.TestCase): def make_tasks(self): dag = DAG('test_dag', description='a test', schedule_interval='@once', start_date=datetime(2018,

Test Dag run for Airflow 1.9 in unittest

雨燕双飞 提交于 2020-02-02 12:14:21
问题 I had implemented test case for running an individual dag but it does not seem to work in 1.9 and may be due to stricter pool which got introduced in airflow 1.8 . I am trying to run below test case: from airflow import DAG from airflow.operators.dummy_operator import DummyOperator from airflow.operators.python_operator import PythonOperator class DAGTest(unittest.TestCase): def make_tasks(self): dag = DAG('test_dag', description='a test', schedule_interval='@once', start_date=datetime(2018,

Dynamic task definition in Airflow

怎甘沉沦 提交于 2020-02-01 17:36:33
问题 I’m currently trying to use Airflow to orchestrate a process where some operators are defined dynamically and depend on the output of another (earlier) operator. In the code below t1 updates a text file with new records (these are actually read from an external queue but, for simplicity, I hard coded them as A, B and C here). Then, I want to create separate operators for each record read from that text file. These operators will create directories A, B and C, respectively, and in Airflow UI

How to create, update and delete airflow variables without using the GUI?

会有一股神秘感。 提交于 2020-01-30 12:31:27
问题 I have been learning airflow and writing DAGs for an ETL pipeline. It involves using the AWS environment (S3, Redshift). It deals with copying data from one bucket to another after storing it in redshift. I am storing bucket names and prefixes as Variables in airflow for which you have to open the GUI and add them manually. Which is the most safest and widely used practice in the industry out of the following options Can we use airflow.cfg to store our variables ( bucket names ) and access

How to create, update and delete airflow variables without using the GUI?

假如想象 提交于 2020-01-30 12:30:28
问题 I have been learning airflow and writing DAGs for an ETL pipeline. It involves using the AWS environment (S3, Redshift). It deals with copying data from one bucket to another after storing it in redshift. I am storing bucket names and prefixes as Variables in airflow for which you have to open the GUI and add them manually. Which is the most safest and widely used practice in the industry out of the following options Can we use airflow.cfg to store our variables ( bucket names ) and access

When does a airflow dag definition get evaluated?

不问归期 提交于 2020-01-30 09:16:25
问题 Suppose I have an airflow dag file that creates a graph like so... def get_current_info(filename) current_info = {} <fill in info in current_info relevant for today's date for given file> return current_info files = [ get_current_info("file_001"), get_current_info("file_002"), .... ] for f in files: <some BashOperator bo1 using f's current info dict> <some BashOperator bo2 using f's current info dict> .... bo1 >> bo2 .... Since these values in the current_info dict that is used to define the

Run airflow process and airflow webserver as airflow user

*爱你&永不变心* 提交于 2020-01-30 09:00:08
问题 Problem : I am setting up a Google Compute Engine VM on GCP with airflow installed on it. I am now trying to integrate airflow with systemd by following instructions on http://airflow.readthedocs.io/en/latest/configuration.html#integration-with-systemd, however it states an assumption that Airflow will run under airflow:airflow . How can I set the airflow installation so that whenever any user on that VM runs airflow from the shell, on backend it runs as airflow user. It is similar to hive

Run airflow process and airflow webserver as airflow user

强颜欢笑 提交于 2020-01-30 09:00:06
问题 Problem : I am setting up a Google Compute Engine VM on GCP with airflow installed on it. I am now trying to integrate airflow with systemd by following instructions on http://airflow.readthedocs.io/en/latest/configuration.html#integration-with-systemd, however it states an assumption that Airflow will run under airflow:airflow . How can I set the airflow installation so that whenever any user on that VM runs airflow from the shell, on backend it runs as airflow user. It is similar to hive