问题
A custom app engine environment fails to start up and it seems to be due to failing health checks. The app has a few custom dependencies (e.g. PostGIS, GDAL) so a few layers on top of the app engine image. It builds successfully and it runs locally in a Docker container.
ERROR: (gcloud.app.deploy) Error Response: [4] Your deployment has failed to become healthy in the allotted time and therefore was rolled back. If you believe this was an error, try adjusting the 'app_start_timeout_sec' setting in the 'readiness_check' section.
The Dockerfile
looks as follows (Note: no CMD
as entrypoint is defined in docker-compose.yml
and app.yaml
):
FROM gcr.io/google-appengine/python
ENV PYTHONUNBUFFERED 1
ENV DEBIAN_FRONTEND noninteractive
RUN apt -y update && apt -y upgrade\
&& apt-get install -y software-properties-common \
&& add-apt-repository -y ppa:ubuntugis/ppa \
&& apt -y update \
&& apt-get -y install gdal-bin libgdal-dev python3-gdal \
&& apt-get autoremove -y \
&& apt-get autoclean -y \
&& apt-get clean \
&& rm -rf /var/lib/apt/lists/*
ADD requirements.txt /app/requirements.txt
RUN python3 -m pip install -r /app/requirements.txt
ADD . /app/
WORKDIR /app
This unfortunately creates an image of a whopping 1.58GB, but the original gcr.io python image starts at 1.05GB, so I don't think the size of the image would or should be a problem.
Running this locally with the following docker-compose.yml
config beautifully spins up a container in no time:
version: "3"
services:
web:
build: .
command: gunicorn gisapplication.wsgi --bind 0.0.0.0:8080
So, I would have expected the following yaml.app
would do the trick:
runtime: custom
env: flex
entrypoint: gunicorn -b :$PORT gisapplication.wsgi
beta_settings:
cloud_sql_instances: <sql-db-connection>
runtime_config:
python_version: 3
No luck. So, as per error above, it seemed to have something to do with the readiness check. Tried increasing the timeout for the app to start (15 mins!) There seemed to have been some issues with health checks previously and rolling back to legacy health checks is not a solution as of Sept 2019.
readiness_check:
path: "/readiness_check"
check_interval_sec: 10
timeout_sec: 10
failure_threshold: 3
success_threshold: 3
app_start_timeout_sec: 900
liveness_check:
path: "/liveness_check"
check_interval_sec: 60
timeout_sec: 4
failure_threshold: 3
success_threshold: 2
initial_delay_sec: 30
Split health checks are definitely on. The output from gcloud beta app describe
is:
authDomain: gmail.com
codeBucket: staging.proj-id-000000.appspot.com
databaseType: CLOUD_DATASTORE_COMPATIBILITY
defaultBucket: proj-id-000000.appspot.com
defaultHostname: proj-id-000000.ts.r.appspot.com
featureSettings:
splitHealthChecks: true
useContainerOptimizedOs: true
gcrDomain: asia.gcr.io
id: proj-id-000000
locationId: australia-southeast1
name: apps/proj-id-000000
servingStatus: SERVING
That didn't work, so also tried to increase the resources available to the instance and allocated the maximum amount of memory for 1 CPU (6.1GB):
resources:
cpu: 1
memory_gb: 6.1
disk_size_gb: 10
Just to be on the safe side, I added health check endpoints to the app (legacy health checks and the split health checks) - it's a Django app, so this went into the project's urls.py
:
path(r'_ah/health/', lambda r: HttpResponse("OK", status=200)),
path(r'readiness_check/', lambda r: HttpResponse("OK", status=200)),
path(r'liveness_check/', lambda r: HttpResponse("OK", status=200)),
So, when I dive into the logs, there seems to be a successful request to /liveness_check
from a curl user agent, but the subsequent requests to /readiness_check
from GoogleHC agent return a 503 (Service Unavailable)
Shortly after (after 8 failed requests - why 8?) a shutdown trigger seems to be sent of:
2020-07-05 09:00:02.603 AEST Triggering app shutdown handlers.
Any ideas of what is going on here? I think I've pretty much exhausted the options to fix this problem and wonder whether the time wouldn't have been better invested in getting things up and running in Compute/EC2.
ADDENDUM:
in addition to the SO issue linked, I've gone through issues on Google (here and here)
回答1:
You are sending the readiness check to path: "/readiness_check"
, but your url handler for that is path(r'readiness_check/'...)
Note trailing slash in the handler. Remove that (or add a trailing slash in the path for readiness_check:
) and see if that fixes it. I would think that would give you a 404
, but you are getting a 503
which tells me that you may have a more serious error. Click one of the arrows at the left of a 503
in the console, and see what the error message is. You may need to search in the console for traceback
to see it.
来源:https://stackoverflow.com/questions/62735687/google-app-engine-deployment-fails-because-of-failing-readiness-check