airflow-scheduler


Airflow DAG Running Every Second Rather Than Every Minute

感情迁移 提交于 2020-02-03 12:59:32
问题 I'm trying to schedule my DAG to run every minute but it seems to be running every second instead. Based on everything I've read I should just need to include schedule_interval='*/1 * * * *', #..every 1 minute in my DAG and that's it but it's not working. Here a simple example I setup to test it out: from airflow import DAG from airflow.operators import SimpleHttpOperator, HttpSensor, EmailOperator, S3KeySensor from datetime import datetime, timedelta from airflow.operators.bash_operator

Airflow skip current task

一个人想着一个人 提交于 2020-01-24 12:15:07
问题 Is there a way for Airflow to skip current task from within the (Python)Operator? For example: def execute(): if condition: skip_current_task() task = PythonOperator(task_id='task', python_callable=execute, dag=some_dag) Skipping downstream tasks doesn't suit me (a solution proposed in this answer: How to skip tasks on Airflow?), as well as branching. Is there a way for a task to mark its state as skipped from within the Operator? 回答1: Figured it out! Skipping task is as easy as: def execute(

Airflow skip current task

你离开我真会死。 提交于 2020-01-24 12:15:06
问题 Is there a way for Airflow to skip current task from within the (Python)Operator? For example: def execute(): if condition: skip_current_task() task = PythonOperator(task_id='task', python_callable=execute, dag=some_dag) Skipping downstream tasks doesn't suit me (a solution proposed in this answer: How to skip tasks on Airflow?), as well as branching. Is there a way for a task to mark its state as skipped from within the Operator? 回答1: Figured it out! Skipping task is as easy as: def execute(

How to skip tasks on Airflow?

浪子不回头ぞ 提交于 2020-01-23 07:51:19
问题 I'm trying to understand whether Airflow supports skipping tasks in a DAG for ad-hoc executions? Lets say my DAG graph look like this: task1 > task2 > task3 > task4 And I would like to start my DAG manually from task3, what is the best way of doing that? I've read about ShortCircuitOperator , but I'm looking for more ad-hoc solution which can apply once the execution is triggered. Thanks! 回答1: You can incorporate the SkipMixin that the ShortCircuitOperator uses under the hood to skip

Use airflow hive operator and output to a text file

有些话、适合烂在心里 提交于 2020-01-14 22:33:37
问题 Hi I want to execute hive query using airflow hive operator and output the result to a file. I don't want to use INSERT OVERWRITE here. hive_ex = HiveOperator( task_id='hive-ex', hql='/sql/hive-ex.sql', hiveconfs={ 'DAY': '{{ ds }}', 'YESTERDAY': '{{ yesterday_ds }}', 'OUTPUT': '{{ file_path }}'+'csv', }, dag=dag ) What is the best way to do this? I know how to do this using bash operator,but want to know if we can use hive operator hive_ex = BashOperator( task_id='hive-ex', bash_command=

Use airflow hive operator and output to a text file

一世执手 提交于 2020-01-14 22:26:02
问题 Hi I want to execute hive query using airflow hive operator and output the result to a file. I don't want to use INSERT OVERWRITE here. hive_ex = HiveOperator( task_id='hive-ex', hql='/sql/hive-ex.sql', hiveconfs={ 'DAY': '{{ ds }}', 'YESTERDAY': '{{ yesterday_ds }}', 'OUTPUT': '{{ file_path }}'+'csv', }, dag=dag ) What is the best way to do this? I know how to do this using bash operator,but want to know if we can use hive operator hive_ex = BashOperator( task_id='hive-ex', bash_command=

How to delete XCOM objects once the DAG finishes its run in Airflow

醉酒当歌 提交于 2020-01-14 08:11:11
问题 I have a huge json file in the XCOM which later I do not need once the dag execution is finished, but I still see the Xcom Object in the UI with all the data, Is there any way to delete the XCOM programmatically once the DAG run is finished. Thank you 回答1: You have to add a task depends on you metadatadb (sqllite, PostgreSql, MySql..) that delete XCOM once the DAG run is finished. delete_xcom_task = PostgresOperator( task_id='delete-xcom-task', postgres_conn_id='airflow_db', sql="delete from

Dynamically Creating DAG based on Row available on DB Connection

六眼飞鱼酱① 提交于 2020-01-06 05:38:31
问题 I want to create a dynamically created DAG from database table query. When I'm trying to create a dynamically creating DAG from both of range of exact number or based on available object in airflow settings it's succeeded. However when I'm trying to use a PostgresHook and create a DAG for each of row of my table, I can see a new DAG generated whenever I add a new row in my table. However it turned out that I can't click the newly created DAG on my airflow web server ui. For more context I'm

Need to access schedule time in DockerOperator in Airflow

两盒软妹~` 提交于 2020-01-06 05:08:39
问题 Need to access schedule time in airflow's docker operator. For example t1 = DockerOperator( task_id="task", dag=dag, image="test-runner:1.0", docker_url="xxx.xxx.xxx.xxx:2376", environment={"FROM": "{{(execution_date + macros.timedelta(hours=6,minutes=(30))).isoformat()}}"}) Basically, I need to populate schedule time as docker environment. 回答1: First macros only works if it is a template_fields. Second, you need to check which version of airflow you are using, if you are using 1.9 or below,

Airflow DAG dynamic structure

寵の児 提交于 2020-01-04 14:24:09
问题 I was looking for a solution where I can decide the dag structure when the dag is triggered as I'm not sure about the number of operators that I'll have to run. Please refer below for the execution sequence that I'm planning to create. |-- Task B.1 --| |-- Task C.1 --| |-- Task B.2 --| |-- Task C.2 --| Task A --|-- Task B.3 --|---> Task B ---> |-- Task C.3 --| | .... | | .... | |-- Task B.N --| |-- Task C.N --| I'm not sure about the value of N. Is this possible in airflow. If so, how do I

工具导航Map