Airflow: how to delete a DAG?

只谈情不闲聊 提交于 2019-11-27 03:10:56

Edit 8/27/18 - Airflow 1.10 is now released on PyPI!

https://pypi.org/project/apache-airflow/1.10.0/


How to delete a DAG completely

We have this feature now in Airflow ≥ 1.10!

The PR #2199 (Jira: AIRFLOW-1002) adding DAG removal to Airflow has now been merged which allows fully deleting a DAG's entries from all of the related tables.

The core delete_dag(...) code is now part of the experimental API, and there are entrypoints available via the CLI and also via the REST API.

CLI:

airflow delete_dag my_dag_id

REST API (running webserver locally):

curl -X "DELETE" http://127.0.0.1:8080/api/experimental/dags/my_dag_id

Warning regarding the REST API: Ensure that your Airflow cluster uses authentication in production.

Installing / upgrading to Airflow 1.10 (current)

To upgrade, run either:

export SLUGIFY_USES_TEXT_UNIDECODE=yes

or:

export AIRFLOW_GPL_UNIDECODE=yes

Then:

pip install -U apache-airflow

Remember to check UPDATING.md first for the full details!

This is my adapted code using PostgresHook with the default connection_id.

import sys
from airflow.hooks.postgres_hook import PostgresHook

dag_input = sys.argv[1]
hook=PostgresHook( postgres_conn_id= "airflow_db")

for t in ["xcom", "task_instance", "sla_miss", "log", "job", "dag_run", "dag" ]:
    sql="delete from {} where dag_id='{}'".format(t, dag_input)
    hook.run(sql, True)

Not sure why Apache Airflow doesn't have an obvious and easy way to delete a DAG

Filed https://issues.apache.org/jira/browse/AIRFLOW-1002

I just wrote a script that deletes everything related to a particular dag, but this is only for MySQL. You can write a different connector method if you are using PostgreSQL. Originally the commands where posted by Lance on https://groups.google.com/forum/#!topic/airbnb_airflow/GVsNsUxPRC0 I just put it in script. Hope this helps. Format: python script.py dag_id

import sys
import MySQLdb

dag_input = sys.argv[1]

query = {'delete from xcom where dag_id = "' + dag_input + '"',
        'delete from task_instance where dag_id = "' + dag_input + '"',
        'delete from sla_miss where dag_id = "' + dag_input + '"',
        'delete from log where dag_id = "' + dag_input + '"',
        'delete from job where dag_id = "' + dag_input + '"',
        'delete from dag_run where dag_id = "' + dag_input + '"',
        'delete from dag where dag_id = "' + dag_input + '"' }

def connect(query):
        db = MySQLdb.connect(host="hostname", user="username", passwd="password", db="database")
        cur = db.cursor()
        cur.execute(query)
        db.commit()
        db.close()
        return

for value in query:
        print value
        connect(value)

I've written a script that deletes all metadata related to a specific dag for the default SQLite DB. This is based on Jesus's answer above but adapted from Postgres to SQLite. Users should set ../airflow.db to wherever script.py is stored relative to the default airflow.db file (usually ~/airflow). To execute, use python script.py dag_id.

import sqlite3
import sys

conn = sqlite3.connect('../airflow.db')
c = conn.cursor()

dag_input = sys.argv[1]

for t in ["xcom", "task_instance", "sla_miss", "log", "job", "dag_run", "dag" ]:
    query = "delete from {} where dag_id='{}'".format(t, dag_input)
    c.execute(query)

conn.commit()
conn.close()

DAG-s can be deleted in Airflow 1.10 but the process and sequence of actions must be right. There's an "egg and chicken problem" - if you delete DAG from frontend while the file is still there the DAG is reloaded (because the file is not deleted). If you delete the file first and refresh the page then DAG cannot be deleted from web gui any more. So the sequence of actions that let me delete a DAG from frontend was:

  1. delete the DAG file (in my case delete from pipeline repository and deploy to airflow servers, esp the scheduler)
  2. DO NOT refresh web GUI.
  3. In the web GUI in the DAGs view (normal frontpage) click on "Delete dag" ->

    the red icon on the far right.
  4. It cleans up all the remains of this DAG from the database.

Airflow 1.10.1 has been released. This release adds the ability to delete a DAG from the web UI after you have deleted the corresponding DAG from the file system.

See this ticket for more details:

[AIRFLOW-2657] Add ability to delete DAG from web ui

Please note that this doesn't actually delete the DAG from the file system, you will need to do this manually first otherwise the DAG will get reloaded.

There is nothing inbuilt in Airflow that does that for you. In order to delete the DAG, delete it from the repository and delete the database entries in the Airflow metastore table - dag.

You can clear a set of task instance, as if they never ran with:

airflow clear dag_id -s 2017-1-23 -e 2017-8-31

And then remove dag file from dags folder

versions >= 1.10.0:

airflow delete_dag <dag_id>

versions <= 1.9.0:

There is not a command to delete a dag, so you need to first delete the dag file, and then delete all the references to the dag_id from the airflow metadata database.

WARNING

You can reset the airflow meta database, you will erase everything, including the dags, but remember that you will also erase the history, pools, variables, etc.

airflow resetdb and then airflow initdb

Based on the answer of @OlegYamin, I'm doing the following to delete a dag backed by postgres, where airflow uses the public schema.

delete from public.dag_pickle where id = (
    select pickle_id from public.dag where dag_id = 'my_dag_id'
);
delete from public.dag_run where dag_id = 'my_dag_id';
delete from public.dag_stats where dag_id = 'my_dag_id';
delete from public.log where dag_id = 'my_dag_id';
delete from public.sla_miss where dag_id = 'my_dag_id';
delete from public.task_fail where dag_id = 'my_dag_id';
delete from public.task_instance where dag_id = 'my_dag_id';
delete from public.xcom where dag_id = 'my_dag_id';
delete from public.dag where dag_id = 'my_dag_id';

WARNING: The effect/correctness of the first delete query is unknown to me. It is just an assumption that it is needed.

user2892949

just delete it from mysql, works fine for me. delete them from below tables:

  • dag

  • dag_constructor

  • dag_group_ship
  • dag_pickle
  • dag_run
  • dag_stats

(might be more tables in future release) then restart webserver and worker.

Remove the dag(you want to delete) from the dags folder and run airflow resetdb.

Alternatively, you can go into the airflow_db and manually delete those entries from the dag tables(task_fail, xcom, task_instance, sla_miss, log, job, dag_run, dag, dag_stats).

For those who are still finding answers. On Airflow version 1.8, its very difficult to delete a DAG, you can refer to answers above. But since 1.9 has been released, you just have to

remove the dag on the dags folder and restart webserver

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!