directed-acyclic-graphs

Schedule a DAG in airflow to run for every 5 minutes , starting from today i.e., 2019-12-18

只谈情不闲聊 提交于 2020-06-29 06:00:54
问题 I am trying to run a DAG for every 5 minutes starting from today(2019-12-18). I defined my start date as start_date:dt.datetime(2019, 12, 18, 10, 00, 00) and schedule interval as schedule_interval= '*/5 * * * *' . When I start the airflow scheduler I don't see any of my tasks running. But when I modify the start_date as start_date:dt.datetime(2019, 12, 17, 10, 00, 00) i.e., Yesterdays date, the DAG runs continuously like for every 10 seconds but not 5 minutes. I think the solution to this

Google Cloud Composer(Airflow) - dataflow job inside a DAG executes successfully, but the DAG fails

纵饮孤独 提交于 2020-06-27 07:29:26
问题 My DAG looks like this default_args = { 'start_date': airflow.utils.dates.days_ago(0), 'retries': 0, 'dataflow_default_options': { 'project': 'test', 'tempLocation': 'gs://test/dataflow/pipelines/temp/', 'stagingLocation': 'gs://test/dataflow/pipelines/staging/', 'autoscalingAlgorithm': 'BASIC', 'maxNumWorkers': '1', 'region': 'asia-east1' } } dag = DAG( dag_id='gcs_avro_to_bq_dag', default_args=default_args, description='ETL for loading data from GCS(present in the avro format) to BQ',

DAG is not visible on Airflow UI

自闭症网瘾萝莉.ら 提交于 2020-06-17 09:34:28
问题 This is my dag file in dags folder. Code that goes along with the Airflow located at: http://airflow.readthedocs.org/en/latest/tutorial.html """ from airflow import DAG from airflow.operators.dummy_operator import DummyOperator from airflow.operators.python_operator import PythonOperator from datetime import datetime, timedelta from work_file import Test class Main(Test): def __init__(self): super(Test, self).__init__() def create_dag(self): default_args = { "owner": "airflow", "depends_on

How to Trigger a DAG on the success of a another DAG in Airflow using Python?

跟風遠走 提交于 2020-05-14 17:47:57
问题 I have a python DAG Parent Job and DAG Child Job . The tasks in the Child Job should be triggered on the successful completion of the Parent Job tasks which are run daily. How can add external job trigger ? MY CODE from datetime import datetime, timedelta from airflow import DAG from airflow.operators.postgres_operator import PostgresOperator from utils import FAILURE_EMAILS yesterday = datetime.combine(datetime.today() - timedelta(1), datetime.min.time()) default_args = { 'owner': 'airflow',

Trigger Cloud Composer DAG with a Pub/Sub message

走远了吗. 提交于 2020-02-25 04:13:14
问题 I am trying to create a Cloud Composer DAG to be triggered via a Pub/Sub message. There is the following example from Google which triggers a DAG every time a change occurs in a Cloud Storage bucket: https://cloud.google.com/composer/docs/how-to/using/triggering-with-gcf However, on the beginning they say you can trigger DAGs in response to events, such as a change in a Cloud Storage bucket or a message pushed to Cloud Pub/Sub . I have spent a lot of time try to figure out how that can be

Chained spark column expressions with distinct windows specs produce inefficient DAG

醉酒当歌 提交于 2020-02-03 05:16:34
问题 Context Let's say you deal with time series data. Your desired outcome relies on multiple window functions with distinct window specifications. The result may resemble a single spark column expression, like an identifier for intervals. Status Quo Usually, I don't store intermediate results with df.withColumn but rather chain/stack column expressions and trust Spark to find the most effective DAG (when dealing with DataFrame). Reproducible example However, in the following example (PySpark 2.4

How to skip tasks on Airflow?

浪子不回头ぞ 提交于 2020-01-23 07:51:19
问题 I'm trying to understand whether Airflow supports skipping tasks in a DAG for ad-hoc executions? Lets say my DAG graph look like this: task1 > task2 > task3 > task4 And I would like to start my DAG manually from task3, what is the best way of doing that? I've read about ShortCircuitOperator , but I'm looking for more ad-hoc solution which can apply once the execution is triggered. Thanks! 回答1: You can incorporate the SkipMixin that the ShortCircuitOperator uses under the hood to skip

How to use apache airflow in a virtual environment?

对着背影说爱祢 提交于 2020-01-17 01:05:52
问题 I am quite new to using apache airflow. I use pycharm as my IDE. I create a project (anaconda environment), create a python script that includes DAG definitions and Bash operators. When I open my airflow webserver, my DAGS are not shown. Only the default example DAGs are shown. My AIRFLOW_HOME variable contains ~/airflow . So i stored my python script there and now it shows. How do I use this in a project environment? Do I change the environment variable at the start of every project? Is

How to use apache airflow in a virtual environment?

瘦欲@ 提交于 2020-01-17 01:05:43
问题 I am quite new to using apache airflow. I use pycharm as my IDE. I create a project (anaconda environment), create a python script that includes DAG definitions and Bash operators. When I open my airflow webserver, my DAGS are not shown. Only the default example DAGs are shown. My AIRFLOW_HOME variable contains ~/airflow . So i stored my python script there and now it shows. How do I use this in a project environment? Do I change the environment variable at the start of every project? Is

How to assign “levels” to vertices of an acyclic directed graph?

。_饼干妹妹 提交于 2020-01-11 03:29:07
问题 I have an acyclic directed graph. I would like to assign levels to each vertex in a manner that guarantees that if the edge (v1,v2) is in the graph, then level(v1) > level(v2). I would also like it if level(v1) = level(v3) whenever (v1,v2) and (v3,v2) are in the graph. Also, the possible levels are discrete (might as well take them to be the natural numbers). The ideal case would be that level(v1) = level(v2) + 1 whenever (v1,v2) is in the graph and there is no other path from v1 to v2, but