Airflow health checks fails when too many tasks are running?

老子叫甜甜 提交于 2021-02-11 15:23:16

问题


I have a single container Airflow setup running on Marathon, using the LocalExecutor. I have a health check running that pings the /health endpoint on the Airflow webserver. It currently has 5 cpus allocated to it and the webserver is running 4 Gunicorn. Last night I had about 25 tasks running concurrently. This caused the health check to fail w/o a helpful error message. The container just received a SIGTERM. I was wondering if anyone could suggest a likely culprit for what caused the health check to fail? Was it CPU contention? Did I not create enough gunicorn workers so that they could respond to the health check request? I have a few ideas, but I'm not certain as to the cause.

Here's the health check configuration in Marathon:

[
  {
    "gracePeriodSeconds": 300,
    "intervalSeconds": 60,
    "timeoutSeconds": 20,
    "maxConsecutiveFailures": 3,
    "portIndex": 0,
    "path": "/admin/",
    "protocol": "HTTP",
    "ignoreHttp1xx": false
  }
]

回答1:


Yep I have seen similar issues before, would it be possible to migrate away from LocalExecutor and single node Airflow services.

If not, it's a case of vertically scaling your instance, to be able to handle the web request during times of heavy compute requirement from Tasks // Scheduler.



来源:https://stackoverflow.com/questions/62684593/airflow-health-checks-fails-when-too-many-tasks-are-running

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!