Kubernetes pod distribution amongst nodes

后端 未结 2 1980
执笔经年
执笔经年 2020-12-28 14:22

Is there any way to make kubernetes distribute pods as much as possible? I have \"Requests\" on all deployments and global Requests as well as HPA. all nodes are the same.

相关标签:
2条回答
  • 2020-12-28 15:11

    Here I leverage Anirudh's answer adding example code.

    My initial kubernetes yaml looked like this:

    apiVersion: extensions/v1beta1
    kind: Deployment
    metadata:
      name: say-deployment
    spec:
      replicas: 6
      template:
        metadata:
          labels:
            app: say
        spec:
          containers:
          - name: say
            image: gcr.io/hazel-champion-200108/say
            ports:
            - containerPort: 8080
    ---
    kind: Service
    apiVersion: v1
    metadata:
      name: say-service
    spec:
      selector:
        app: say
      ports:
        - protocol: TCP
          port: 8080
      type: LoadBalancer
      externalIPs:
        - 192.168.0.112
    

    At this point, kubernetes scheduler somehow decides that all the 6 replicas should be deployed on the same node.

    Then I added requiredDuringSchedulingIgnoredDuringExecution to force the pods beeing deployed on different nodes:

    apiVersion: extensions/v1beta1
    kind: Deployment
    metadata:
      name: say-deployment
    spec:
      replicas: 3
      template:
        metadata:
          labels:
            app: say
        spec:
          containers:
          - name: say
            image: gcr.io/hazel-champion-200108/say
            ports:
            - containerPort: 8080
          affinity:
                  podAntiAffinity:
                    requiredDuringSchedulingIgnoredDuringExecution:
                      - labelSelector:
                          matchExpressions:
                            - key: "app"
                              operator: In
                              values:
                              - say
                        topologyKey: "kubernetes.io/hostname"
    ---
    kind: Service
    apiVersion: v1
    metadata:
      name: say-service
    spec:
      selector:
        app: say
      ports:
        - protocol: TCP
          port: 8080
      type: LoadBalancer
      externalIPs:
        - 192.168.0.112
    

    Now all the pods are run on different nodes. And since I have 3 nodes and 6 pods, other 3 pods (6 minus 3) can't be running (pending). This is because I required it: requiredDuringSchedulingIgnoredDuringExecution.

    kubectl get pods -o wide 
    
    NAME                              READY     STATUS    RESTARTS   AGE       IP            NODE
    say-deployment-8b46845d8-4zdw2   1/1       Running            0          24s       10.244.2.80   night
    say-deployment-8b46845d8-699wg   0/1       Pending            0          24s       <none>        <none>
    say-deployment-8b46845d8-7nvqp   1/1       Running            0          24s       10.244.1.72   gray
    say-deployment-8b46845d8-bzw48   1/1       Running            0          24s       10.244.0.25   np3
    say-deployment-8b46845d8-vwn8g   0/1       Pending            0          24s       <none>        <none>
    say-deployment-8b46845d8-ws8lr   0/1       Pending            0          24s       <none>        <none>
    

    Now if I loosen this requirement with preferredDuringSchedulingIgnoredDuringExecution:

    apiVersion: extensions/v1beta1
    kind: Deployment
    metadata:
      name: say-deployment
    spec:
      replicas: 6
      template:
        metadata:
          labels:
            app: say
        spec:
          containers:
          - name: say
            image: gcr.io/hazel-champion-200108/say
            ports:
            - containerPort: 8080
          affinity:
                  podAntiAffinity:
                    preferredDuringSchedulingIgnoredDuringExecution:
                      - weight: 100
                        podAffinityTerm:
                          labelSelector:
                            matchExpressions:
                              - key: "app"
                                operator: In
                                values:
                                - say
                          topologyKey: "kubernetes.io/hostname"
    ---
    kind: Service
    apiVersion: v1
    metadata:
      name: say-service
    spec:
      selector:
        app: say
      ports:
        - protocol: TCP
          port: 8080
      type: LoadBalancer
      externalIPs:
        - 192.168.0.112
    

    First 3 pods are deployed on 3 different nodes just like in the previous case. And the rest 3 (6 pods minus 3 nodes) are deployed on various nodes according to kubernetes internal considerations.

    NAME                              READY     STATUS    RESTARTS   AGE       IP            NODE
    say-deployment-57cf5fb49b-26nvl   1/1       Running   0          59s       10.244.2.81   night
    say-deployment-57cf5fb49b-2wnsc   1/1       Running   0          59s       10.244.0.27   np3
    say-deployment-57cf5fb49b-6v24l   1/1       Running   0          59s       10.244.1.73   gray
    say-deployment-57cf5fb49b-cxkbz   1/1       Running   0          59s       10.244.0.26   np3
    say-deployment-57cf5fb49b-dxpcf   1/1       Running   0          59s       10.244.1.75   gray
    say-deployment-57cf5fb49b-vv98p   1/1       Running   0          59s       10.244.1.74   gray
    
    0 讨论(0)
  • 2020-12-28 15:11

    Sounds like what you want is Inter-Pod Affinity and Pod Anti-affinity.

    Inter-pod affinity and anti-affinity were introduced in Kubernetes 1.4. Inter-pod affinity and anti-affinity allow you to constrain which nodes your pod is eligible to schedule on based on labels on pods that are already running on the node rather than based on labels on nodes. The rules are of the form “this pod should (or, in the case of anti-affinity, should not) run in an X if that X is already running one or more pods that meet rule Y.” Y is expressed as a LabelSelector with an associated list of namespaces (or “all” namespaces); unlike nodes, because pods are namespaced (and therefore the labels on pods are implicitly namespaced), a label selector over pod labels must specify which namespaces the selector should apply to. Conceptually X is a topology domain like node, rack, cloud provider zone, cloud provider region, etc. You express it using a topologyKey which is the key for the node label that the system uses to denote such a topology domain, e.g. see the label keys listed above in the section “Interlude: built-in node labels.”

    Anti-affinity can be used to ensure that you are spreading your pods across failure domains. You can state these rules as preferences, or as hard rules. In the latter case, if it is unable to satisfy your constraint, the pod would fail to get scheduled.

    0 讨论(0)
提交回复
热议问题