Airflow “This DAG isnt available in the webserver DagBag object ”

后端 未结 5 1198
天涯浪人
天涯浪人 2020-12-03 00:42

when I put a new DAG python script in the dags folder, I can view a new entry of DAG in the DAG UI but it was not enabled automatically. On top of that, it seems does not lo

5条回答
  •  一个人的身影
    2020-12-03 01:19

    I have a theory about possible cause of this issue in Google Composer. There is section about dag failures on webserver in troubleshooting documentation for Composer, which says:

    Avoid running heavyweight computation at DAG parse time. Unlike the worker and scheduler nodes, whose machine types can be customized to have greater CPU and memory capacity, the webserver uses a fixed machine type, which can lead to DAG parsing failures if the parse-time computation is too heavyweight.

    And I was trying to load configuration from external source (which actually took negligible amount of time comparing to other operations to create DAG, but still broke something, because webserver of Airflow in composer runs on App Engine, which has strange behaviours).

    I found the workaround in discussion of this Google issue, and it is to create separate DAG with task which loads all the data needed and stores that data in airflow variable:

    Variable.set("pipeline_config", config, serialize_json=True)
    

    Then I could do

    Variable.get("pipeline_config", deserialize_json=True)
    

    And successfully generate pipeline from that. Additional benefit is that I get logs from that task, which I get from web server, because of this issue.

提交回复
热议问题