duplicate key value violates unique constraint when adding path variable in airflow dag

别来无恙 提交于 2019-12-31 06:04:49

问题


To set up the connections and variables in airflow i use a DAG, we do this inorder to setup airflow fast in case we have to setup everything again fast. It does work my connections and variables show up but the task "fails". The error is saying that there is already an sql_path variable

[2018-03-30 19:42:48,784] {{models.py:1595}} ERROR - (psycopg2.IntegrityError) duplicate key value violates unique constraint "variable_key_key"
DETAIL:  Key (key)=(sql_path) already exists.
 [SQL: 'INSERT INTO variable (key, val, is_encrypted) VALUES (%(key)s, %(val)s, %(is_encrypted)s) RETURNING variable.id'] [parameters: {'key': 'sql_path', 'val': 'gAAAAABavpM46rWjISLZRRKu4hJRD7HFKMuXMpmJ5Z3DyhFbFOQ91cD9NsQsYyFof_pdPn116d6yNoNoOAqx_LRqMahjbYKUqrhNRiYru4juPv4JEGAv2d0=', 'is_encrypted': True}] (Background on this error at: http://sqlalche.me/e/gkpj)
Traceback (most recent call last):
  File "/usr/local/lib/python3.6/site-packages/sqlalchemy/engine/base.py", line 1193, in _execute_context
    context)
  File "/usr/local/lib/python3.6/site-packages/sqlalchemy/engine/default.py", line 507, in do_execute
    cursor.execute(statement, parameters)
psycopg2.IntegrityError: duplicate key value violates unique constraint "variable_key_key"
DETAIL:  Key (key)=(sql_path) already exists.

However I checked and before I run the DAG the addhoc query SELECT * FROM variable returns nothing and afterwards it returns my two variables.

I checked that I don't create the variable twice but I don't think so. Here you see the part of the dag creating the path variables

import airflow
from datetime import datetime, timedelta
from airflow.operators.python_operator import PythonOperator
from airflow import models
from airflow.settings import Session
import logging


args = {
    'owner': 'airflow',
    'start_date': airflow.utils.dates.days_ago(1),
    'provide_context': True
}


def init_staging_airflow():
    logging.info('Creating connections, pool and sql path')

    session = Session()

    new_var = models.Variable()
    new_var.key = "sql_path"
    new_var.set_val("/usr/local/airflow/sql")
    session.add(new_var)
    session.commit()

    new_var = models.Variable()
    new_var.key = "conf_path"
    new_var.set_val("/usr/local/airflow/conf")
    session.add(new_var)
    session.commit()

    session.add(new_pool)
    session.commit()

    session.close()

dag = airflow.DAG(
    'init_staging_airflow',
    schedule_interval="@once",
    default_args=args,
    max_active_runs=1)

t1 = PythonOperator(task_id='init_staging_airflow',
                    python_callable=init_staging_airflow,
                    provide_context=False,
                    dag=dag)

回答1:


I ran into the same problem when trying to do Variable.set() inside a DAG. I believe the scheduler will constantly poll the DagBag to refresh any changes dynamically. That's why you see a ton of these when running the webserver:

[2018-04-02 11:28:41,531] [45914] {models.py:168} INFO - Filling up the DagBag from /Users/jasontang/XXX/data-server/dags

Sooner or later you'll hit the key constraint:

What I did was to set all my variables that I need to set at runtime into a global dictionary ("VARIABLE_DICT" in the example below), and just allow all my DAGs and sub-DAGs access it.

def initialize(dag_run_obj):
    global VARIABLE_DICT
    if dag_run_obj.external_trigger:
        VARIABLE_DICT.update(dag_run_obj.conf)
        values = (dag_run_obj.conf['client'],
                  dag_run_obj.conf['vertical'],
                  dag_run_obj.conf['frequency'],
                  dag_run_obj.conf.get('snapshot'))
        config_file = '{0}-{1}/{0}-{1}-{2}.json'.format(*values)
        path = os.path.join(Variable.get('repo_root'), 'conf', config_file)
        VARIABLE_DICT.update(read_config(path))

You could ignore the dag_run_obj part, since I specifically look for any additional configuration values provided to the DAG Run when it is created. In your other DAGs and subDAGs just import the dictionary.




回答2:


justang is correct, the reason this is happening is because the scheduler executes your dag every time the scheduler runs (the scheduler runs frequently to check to see if your DAGs have changed, if they need to be started etc.).



来源:https://stackoverflow.com/questions/49581440/duplicate-key-value-violates-unique-constraint-when-adding-path-variable-in-airf

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!