duplicate key value violates unique constraint when adding path variable in airflow dag

我的未来我决定 提交于 2019-12-02 10:31:41

I ran into the same problem when trying to do Variable.set() inside a DAG. I believe the scheduler will constantly poll the DagBag to refresh any changes dynamically. That's why you see a ton of these when running the webserver:

[2018-04-02 11:28:41,531] [45914] {models.py:168} INFO - Filling up the DagBag from /Users/jasontang/XXX/data-server/dags

Sooner or later you'll hit the key constraint:

What I did was to set all my variables that I need to set at runtime into a global dictionary ("VARIABLE_DICT" in the example below), and just allow all my DAGs and sub-DAGs access it.

def initialize(dag_run_obj):
    global VARIABLE_DICT
    if dag_run_obj.external_trigger:
        VARIABLE_DICT.update(dag_run_obj.conf)
        values = (dag_run_obj.conf['client'],
                  dag_run_obj.conf['vertical'],
                  dag_run_obj.conf['frequency'],
                  dag_run_obj.conf.get('snapshot'))
        config_file = '{0}-{1}/{0}-{1}-{2}.json'.format(*values)
        path = os.path.join(Variable.get('repo_root'), 'conf', config_file)
        VARIABLE_DICT.update(read_config(path))

You could ignore the dag_run_obj part, since I specifically look for any additional configuration values provided to the DAG Run when it is created. In your other DAGs and subDAGs just import the dictionary.

justang is correct, the reason this is happening is because the scheduler executes your dag every time the scheduler runs (the scheduler runs frequently to check to see if your DAGs have changed, if they need to be started etc.).

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!