可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效,请关闭广告屏蔽插件后再试):
问题:
I have started the Airflow webserver and scheduled some dags. I can see the dags on web GUI.
How can I delete a particular DAG from being run and shown in web GUI? Is there an Airflow CLI command to do that?
I looked around but could not find an answer for a simple way of deleting a DAG once it has been loaded and scheduled.
回答1:
This is my adapted code using PostgresHook with the default connection_id.
import sys from airflow.hooks.postgres_hook import PostgresHook dag_input = sys.argv[1] hook=PostgresHook( postgres_conn_id= "airflow_db") for t in ["xcom", "task_instance", "sla_miss", "log", "job", "dag_run", "dag" ]: sql="delete from {} where dag_id='{}'".format(t, dag_input) hook.run(sql, True)
回答2:
Not sure why Apache Airflow doesn't have an obvious and easy way to delete a DAG
Filed https://issues.apache.org/jira/browse/AIRFLOW-1002
回答3:
I just wrote a script that deletes everything related to a particular dag, but this is only for MySQL. You can write a different connector method if you are using PostgreSQL. Originally the commands where posted by Lance on https://groups.google.com/forum/#!topic/airbnb_airflow/GVsNsUxPRC0 I just put it in script. Hope this helps. Format: python script.py dag_id
import sys import MySQLdb dag_input = sys.argv[1] query = {'delete from xcom where dag_id = "' + dag_input + '"', 'delete from task_instance where dag_id = "' + dag_input + '"', 'delete from sla_miss where dag_id = "' + dag_input + '"', 'delete from log where dag_id = "' + dag_input + '"', 'delete from job where dag_id = "' + dag_input + '"', 'delete from dag_run where dag_id = "' + dag_input + '"', 'delete from dag where dag_id = "' + dag_input + '"' } def connect(query): db = MySQLdb.connect(host="hostname", user="username", passwd="password", db="database") cur = db.cursor() cur.execute(query) db.commit() db.close() return for value in query: print value connect(value)
回答4:
I've written a script that deletes all metadata related to a specific dag for the default SQLite DB. This is based on Jesus's answer above but adapted from Postgres to SQLite. Users should set ../airflow.db
to wherever script.py is stored relative to the default airflow.db file (usually ~/airflow
). To execute, use python script.py dag_id
.
import sqlite3 import sys conn = sqlite3.connect('../airflow.db') c = conn.cursor() dag_input = sys.argv[1] for t in ["xcom", "task_instance", "sla_miss", "log", "job", "dag_run", "dag" ]: query = "delete from {} where dag_id='{}'".format(t, dag_input) c.execute(query) conn.commit() conn.close()
回答5:
There is nothing inbuilt in Airflow that does that for you. In order to delete the DAG, delete it from the repository and delete the database entries in the Airflow metastore table - dag.
回答6:
You can clear a set of task instance, as if they never ran with:
airflow clear dag_id -s 2017-1-23 -e 2017-8-31
And then remove dag file from dags folder
回答7:
We have this feature now!
The PR #2199 (Jira: AIRFLOW-1002) adding DAG removal to Airflow has now been merged which allows fully deleting a DAG's entries from all of the related tables.
The core delete_dag(...) code is now part of the experimental API, and there are entrypoints available via the CLI and also via the REST API.
CLI:
airflow delete_dag my_dag_id
REST API (running webserver locally):
curl -X "DELETE" http://127.0.0.1:8080/api/experimental/dags/my_dag_id/delete_dag
There hasn't been a new release yet to include this feature but you can use it on master today, and it will be available via PyPI in the next release, 1.10.
回答8:
For those who are still finding answers. On Airflow version 1.8, its very difficult to delete a DAG, you can refer to answers above. But since 1.9 has been released, you just have to
remove the dag on the dags folder and do airflow resetdb