GKE - ErrImagePull pulling from Google Container Registry

后端 未结 5 568
予麋鹿
予麋鹿 2020-12-11 23:18

I have a Google Kubernetes Engine cluster which until recently was happily pulling private container images from a Google Container Registry bucket. I haven\'t changed anyth

相关标签:
5条回答
  • 2020-12-11 23:56

    In my case, the issue turned out to be that the node pools generated by a minimal spec file are missing the oauth2 scopes that give access to the registry. Adding

    nodePools:
      config:
        oauthScopes:
        - https://www.googleapis.com/auth/devstorage.read_only
        - https://www.googleapis.com/auth/servicecontrol
        - https://www.googleapis.com/auth/service.management.readonly
        - https://www.googleapis.com/auth/trace.append
    
    

    to my spec fixed things. I think it's the devstorage scope that's the important one, but I'm not sure since I just copy-pasted the whole list of scopes from the spec the web console generates.

    0 讨论(0)
  • 2020-12-11 23:56

    In my case, setting the correct oAuth scopes didn't work. So I just configured it for any other private repository by adding imagePullSecrets to my Pod Spec.

    Kubernetes Docs | Pull an Image from a Private Registry

    Sample Script to generate registry credentials in pipeline

    You could do this manually as well if you don't manage your infrastructure as code right now.

    # Setup registry credentials so we can pull images from gcr
    gcloud auth print-access-token | docker login -u oauth2accesstoken --password-stdin https://gcr.io
    
    kubectl create secret generic regcred \
        --namespace=development \
        --from-file=.dockerconfigjson="${HOME}/.docker/config.json" \
        --type=kubernetes.io/dockerconfigjson \
        --output yaml --dry-run | kubectl apply -f - # create or update if already created
    

    Sample Deployment File

    (Don't mind all the substitutions). That isn't relevant. Just check the last line of the yaml file.

    apiVersion: apps/v1
    kind: Deployment
    metadata:
      namespace: ${NAMESPACE}
      name: ${PROJECT_PREFIX}-${PROJECT_TYPE}-${PROJECT_NAME}
      labels:
        name: ${PROJECT_PREFIX}-${PROJECT_TYPE}-${PROJECT_NAME}
    spec:
      replicas: ${REPLICA_COUNT}
      selector:
        matchLabels:
          name: ${PROJECT_PREFIX}-${PROJECT_TYPE}-${PROJECT_NAME}
      strategy:
        type: RollingUpdate
        rollingUpdate:
          maxSurge: 1
      template:
        metadata:
          labels:
            name: ${PROJECT_PREFIX}-${PROJECT_TYPE}-${PROJECT_NAME}
        spec:
          containers:
            - name: ${PROJECT_PREFIX}-${PROJECT_TYPE}-${PROJECT_NAME}
              image: gcr.io/${GOOGLE_PROJECT_ID}/${PROJECT_TYPE}-${PROJECT_NAME}:${GITHUB_SHA}
              imagePullPolicy: IfNotPresent
              ports:
                - name: http
                  containerPort: ${PORT}
                  protocol: TCP
              readinessProbe:
                httpGet:
                  path: /${PROJECT_NAME}/v1/health
                  port: ${PORT}
                initialDelaySeconds: 0
                timeoutSeconds: 10
                periodSeconds: 10
              resources:
                requests:
                  cpu: ${RESOURCES_CPU_REQUEST}
                  memory: ${RESOURCES_MEMORY_REQUEST}
                limits:
                  cpu: ${RESOURCES_CPU_LIMIT}
                  memory: ${RESOURCES_MEMORY_LIMIT}
              env:
                - name: NODE_ENV
                  value: ${NODE_ENV}
                - name: PORT
                  value: '${PORT}'
          imagePullSecrets:
            - name: regcred
    
    0 讨论(0)
  • 2020-12-11 23:57

    Check the node events for the actual error. For me it said:

    Failed to pull image "gcr.io/project/image@sha256:c8e91af54fc17faa1c49d2a05def5cbabf8f0a67fc558eb6cbca138061b8400a":
     rpc error: code = Unknown desc = error pulling image configuration: unknown blob
    

    That turned out to be the image being gone or corrupted. After pushing the image again it worked fine.

    0 讨论(0)
  • 2020-12-12 00:04

    I got the same issue when I created a cluster with terraform. Firstly, I only specified service_account in node_config so node pool was made with too small OAuth scopes. Explicitly write both service_account and oauth_scope like below, nodes are able to pull images from private GCR repositories.

    resource "google_container_node_pool" "primary_preemptible_nodes" {
      node_config {
        service_account = "${google_service_account.gke_nodes.email}"
    
        oauth_scopes = [
          "storage-ro",
          "logging-write",
          "monitoring"
        ]
      }
    }
    
    0 讨论(0)
  • 2020-12-12 00:07

    Ok, this turned out to be tricky, but the cause was this:

    I used Terraform to set the service account for the nodes in the GKE cluster, but instead of using the email output of the google_service_account resource to specify the service account, I used the unique_id output instead. This was accepted fine by both Terraform and the Google Cloud API.

    When Kubernetes (and other things) was trying to access the internal metadata API on each node to get an token it could use, it was receiving a response of Service account is invalid/disabled and a 403 status.

    Recreating the node pool with the correctly specified service account fixed the problem.

    0 讨论(0)
提交回复
热议问题