What is the difference between min_file_process_interval and dag_dir_list_interval in Apache Airflow 1.9.0?

家住魔仙堡 提交于 2019-12-21 05:11:10

问题


We are using Airflow v 1.9.0. We have 100+ dags and the instance is really slow. The scheduler is only launching some tasks.

In order to reduce the amount of CPU usage, we want to tweak some configuration parameters, namely: min_file_process_interval and dag_dir_list_interval. The documentation is not really clear about the difference between the two


回答1:


min_file_process_interval:

In cases where there are only a small number of DAG definition files, the loop could potentially process the DAG definition files many times a minute. To control the rate of DAG file processing, the min_file_process_interval can be set to a higher value. This parameter ensures that a DAG definition file is not processed more often than once every min_file_process_interval seconds.

dag_dir_list_interval:

Since the scheduler can run indefinitely, it's necessary to periodically refresh the list of files in the DAG definition directory. The refresh interval is controlled with the dag_dir_list_interval configuration parameter.

Source: A Google search on both terms lead to this first result https://cwiki.apache.org/confluence/display/AIRFLOW/Scheduler+Basics



来源:https://stackoverflow.com/questions/51558313/what-is-the-difference-between-min-file-process-interval-and-dag-dir-list-interv

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!