Gcloud - cloud run deployment fails for deployment to GKE

问题

I am trying to deploy a sample angular app to GKE. I created a sample cluster enabling cloud run and istio services in it

gcloud beta container clusters create new-cluster \
--addons=HorizontalPodAutoscaling,HttpLoadBalancing,Istio,CloudRun \
--machine-type=n1-standard-2 \
--cluster-version=latest \
--zone=us-east1-b \
--enable-stackdriver-kubernetes --enable-ip-alias \
--scopes cloud-platform --num-nodes 4  --disk-size "10"  --image-type "COS"

Following is my cloudbuild.yaml file steps:

 # build the container image
  - name: gcr.io/cloud-builders/docker
    args: [ build, -t, gcr.io/$GCLOUD_PROJECT/gcp-cloudrun-gke-angular:1.01, . ]

  # push the container image to Container Registry
  - name: gcr.io/cloud-builders/docker
    args: [ push, gcr.io/$GCLOUD_PROJECT/gcp-cloudrun-gke-angular:1.01 ]

  # Deploy container image to Cloud Run
  - name: gcr.io/cloud-builders/gcloud
    args: [ beta, run, deploy, feedback-ui-deploy-anthos, --image, gcr.io/$GCLOUD_PROJECT/gcp-cloudrun-gke-angular:1.01, --platform, gke, --cluster, cloudrun-angular-cluster, --cluster-location, us-central1-a ]


images:

  - gcr.io/$GCLOUD_PROJECT/gcp-cloudrun-gke-angular:1.01

I have set the env var for gcloud prj. Now when I try deploying this tothe gke cluster created above, I always end up with the revision unavailable error:

Deploying new service... Configuration "service-1" does not have any ready Revision.                                                                        
  - Creating Revision...                                                                                                                                      
  X Routing traffic... Configuration "service-1" does not have any ready Revision.

This is the command I used to deploy to cloud run

gcloud beta run deploy --platform gke --cluster new-cluster --image gcr.io/$GCLOUD_PROJECT/gcp-cloudrun-gke-angular:1.01 --cluster-location us-east1-b

The other fully managed cloud run works perfectly. But when I deploy to existing gke clusters, I end up with the error. I read through the documentation and it says that the revision gets created automatically if it is a new service, not sure why that is not happening for my service

EDIT: Here is the kubectl describe output. I deleted all clusters and created a new one afresh and still ended up with the same.

So upon describing the service, this is what I get

Note: I use the default namespace. not sure if it has any bearing on this issue.

Status:
  Conditions:
    Last Transition Time:  2019-12-04T12:49:59Z
    Message:               Revision "gke-service-00001-pef" failed with message: Container failed with: nginx: [alert] could not open error log file: open() "/var/log/nginx/error.log" failed (2: No such file or directory)
2019/12/04 12:49:40 [emerg] 1#1: open() "/var/log/nginx/error.log" failed (2: No such file or directory)
.
    Reason:                      RevisionFailed
    Status:                      False
    Type:                        ConfigurationsReady
    Last Transition Time:        2019-12-04T12:49:59Z
    Message:                     Configuration "gke-service" does not have any ready Revision.
    Reason:                      RevisionMissing
    Status:                      False
    Type:                        Ready
    Last Transition Time:        2019-12-04T12:49:59Z
    Message:                     Configuration "gke-service" does not have any ready Revision.
    Reason:                      RevisionMissing
    Status:                      False
    Type:                        RoutesReady
  Latest Created Revision Name:  gke-service-00001-pef
  Observed Generation:           1
  URL:                           http://gke-service.default.example.com
Events:
  Type    Reason   Age                  From                Message
  ----    ------   ----                 ----                -------
  Normal  Created  2m21s                service-controller  Created Configuration "gke-service"
  Normal  Created  2m21s                service-controller  Created Route "gke-service"
  Normal  Updated  20s (x5 over 2m21s)  service-controller  Updated Service "gke-service"

Since I am exposing the angular index.html file via nginx, this is my configurtion:

server {


  listen 8080 default_server;

  sendfile on;

  default_type application/octet-stream;

  gzip on;
  gzip_http_version 1.1;
  gzip_disable      "MSIE [1-6]\.";
  gzip_min_length   1100;
  gzip_vary         on;
  gzip_proxied      expired no-cache no-store private auth;
  gzip_types        text/plain text/css application/json application/javascript application/x-javascript text/xml application/xml application/xml+rss text/javascript;
  gzip_comp_level   9;


  root /usr/share/nginx/html;


  location / {
    try_files $uri $uri/ /index.html =404;
    #proxy_pass: "http://localhost:8080/AdTechUIContent"
    #uncomment to include naxsi rules
    #include /etc/nginx/naxsi.rules
  }

}

This works fine when I build the docker image in local and I able to access it. Just in case, this is my docker file

FROM node:12.13-alpine as app-ui-builder

#Now install angular cli globally
RUN npm install -g @angular/cli@8.3.14
#RUN npm config set registry https://registry.cnpmjs.org
#Install git and openssh because alpine image doenst have git and all modules in npm has the dependicies which are all uploaded in git
#so to use them we need to be able git
RUN apk add --update git openssh
RUN mkdir ./app
COPY package*.json /app/
WORKDIR ./app
COPY . .
RUN npm cache clear --force && npm i

RUN ls && $(npm bin)/ng build --prod

FROM nginx:1.17.5-alpine AS nginx-builder
RUN apk update && apk add ca-certificates && rm -rf /var/cache/apk/*
COPY app-ui-nginx.conf /etc/nginx/conf.d
RUN rm -rf /usr/share/nginx/html/*
COPY --from=app-ui-builder /app/dist/app-ui /usr/share/nginx/html
RUN ls /usr/share/nginx/html
RUN chmod -R a+r /usr/share/nginx/html

EXPOSE 8080
#
CMD ["nginx", "-g", " daemon off;"]

@AhmetB . Could you let me know why the nginx is throwing error here

EDIT: I did try deploying the app using plain Kubectl commands with a deployment and a service. It worked fine. So not sure what cloud run contract it violates for the error logging with nginx even though the file can be found

回答1:

Deploying new service... Configuration "service-1" does not have any ready Revision.

This error means that it's deployed but for some reason the pod is crashing or not scheduling. This can happen for all sorts of reasons like not enough CPU/memory on the node, image cannot be pulled from GCR, or the app is crashlooping.

Look at "kubectl logs" and "kubectl describe" output for your application. Try:

kubectl get ksvc
kubectl get pods
kubectl describe ksvc NAME
kubectl logs NAME -c user-container

回答2:

I found the issue. Looks like the log files (both error and access log files) should be created in a custom folder for cloud run to access to. Cloud run checks for those folders to be available before spining up a revision. When I used the old nginx config file, no custom folders were created. Now modified the nginx conf file and deployed it and it worked fine

Created two files nginx.conf

user nginx;
worker_processes  1;

error_log  /var/logs/nginx/error.log warn;
pid        /var/run/nginx.pid;


events {
    worker_connections  1024;
}


http {
    include       /etc/nginx/mime.types;
    default_type  application/octet-stream;

    log_format  main  '$remote_addr - $remote_user [$time_local] "$request" '
                      '$status $body_bytes_sent "$http_referer" '
                      '"$http_user_agent" "$http_x_forwarded_for"';

    access_log  /var/logs/nginx/access.log  main;

    sendfile        on;
    #tcp_nopush     on;

    keepalive_timeout  65;

    #gzip  on;

    include /etc/nginx/conf.d/*.conf;
}

default.conf

server {
    listen       8080;
    server_name  localhost;

    location / {
        root   /usr/share/nginx/html;
        index  index.html index.htm;
    }

    # redirect server error pages to the static page /50x.html
    #
    error_page   500 502 503 504  /50x.html;
    location = /50x.html {
        root   /usr/share/nginx/html;
    }
}

Also modified the dockerfile

FROM node:12.13-alpine as app-ui-builder

RUN npm install -g @angular/cli@8.3.14
RUN apk add --update git openssh
RUN mkdir ./app
COPY package*.json /app/
WORKDIR ./app
COPY . .
RUN npm cache clear --force && npm i

 RUN ls && $(npm bin)/ng build --prod

FROM nginx:alpine AS nginx-builder
RUN apk update && apk add ca-certificates && rm -rf /var/cache/apk/*
#RUN rm -rf /etc/nginx/conf.d/*
RUN mkdir /var/logs
RUN mkdir /var/logs/nginx
COPY ./docker/nginx.conf /etc/nginx/
## Copy a new configuration file setting listen port to 8080
COPY ./docker/default.conf /etc/nginx/conf.d/
RUN rm -rf /usr/share/nginx/html/*
#
COPY --from=app-ui-builder /app/dist/app-ui
/usr/share/nginx/html
EXPOSE 8080
CMD ["nginx", "-g", " daemon off;"]

Found it via this medium post

回答3:

Does your cluster have any role Based access control Storage permissions. I also suggest that you verify Permissions required to deploy or Cloud Run for Anthos

Check if you have the Storage permission and scopes4

来源：https://stackoverflow.com/questions/59107010/gcloud-cloud-run-deployment-fails-for-deployment-to-gke

标签

google-kubernetes-engine

gcloud

google-cloud-run

google-anthos