Kubernetes

在 Kubernetes 上弹性深度学习训练利器-Elastic Training Operator

一世执手 提交于 2021-02-19 17:13:49
背景 由于云计算在资源成本和弹性扩容方面的天然优势,越来越多客户愿意在云上构建 AI 系统,而以容器、Kubernetes 为代表的云原生技术,已经成为释放云价值的最短路径, 在云上基于 Kubernetes 构建 AI 平台已经成为趋势。 当面临较复杂的模型训练或者数据量大时,单机的计算能力往往无法满足算力要求。通过使用阿里的 AiACC 或者社区的 horovod 等分布式训练框架,仅需修改几行代码,就能将一个单机的训练任务扩展为支持分布式的训练任务。在 Kubernetes 上常见的是 kubeflow 社区的 tf-operator 支持 Tensorflow PS 模式,或者 mpi-operator 支持 horovod 的 mpi allreduce 模式。 现状 Kubernetes 和云计算提供敏捷性和伸缩性,我们可以通过 cluster-AutoScaler 等组件为训练任务设置弹性策略,利用 Kubernetes 的弹性能力,按需创建,减少 GPU 设备空转。 但这种伸缩模式面对训练这种离线任务还是略有不足: 不支持容错,当部分 Worker 由于设备原因失败,整个任务需要停止重来。 训练任务一般时间较长,占用算力大,任务缺少弹性能力。当资源不足时,除非任务终止,无法按需为其他业务腾出资源。 训练任务时间较长,不支持 worker 动态配置,

Can I access K8s ClusterIP from k8s node directly?

耗尽温柔 提交于 2021-02-19 08:33:34
问题 I am using k8s 1.2 on ubuntu 14.04.4. Here is some info on my one k8s minion node: # cat /etc/os-release NAME="Ubuntu" VERSION="14.04.4 LTS, Trusty Tahr" ID=ubuntu ID_LIKE=debian PRETTY_NAME="Ubuntu 14.04.4 LTS" VERSION_ID="14.04" HOME_URL="http://www.ubuntu.com/" SUPPORT_URL="http://help.ubuntu.com/" BUG_REPORT_URL="http://bugs.launchpad.net/ubuntu/" # uname -a Linux k8s-010 3.19.0-47-generic #53~14.04.1-Ubuntu SMP Mon Jan 18 16:09:14 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux You see, I

Access .NET Core app on Kubernetes on both http and https

半城伤御伤魂 提交于 2021-02-19 08:32:31
问题 Being new to Kubernetes, I am trying to make a simple .NET Core 3 MVC app run on Kubernetes and reply on port 443 as well as port 80. I have a working Docker-Compose setup which I am trying to port to Kubernetes. Running Docker Desktop CE with nginx-ingress on Win 10 Pro. So far it is working on port 80. (http://mymvc.local on host Win 10 - hosts file redirects mymvc.local to 127.0.0.1) My MVC app is running behind service mvc on port 5000. I've made a self-signed certificate for the domain

Can I access K8s ClusterIP from k8s node directly?

喜欢而已 提交于 2021-02-19 08:32:24
问题 I am using k8s 1.2 on ubuntu 14.04.4. Here is some info on my one k8s minion node: # cat /etc/os-release NAME="Ubuntu" VERSION="14.04.4 LTS, Trusty Tahr" ID=ubuntu ID_LIKE=debian PRETTY_NAME="Ubuntu 14.04.4 LTS" VERSION_ID="14.04" HOME_URL="http://www.ubuntu.com/" SUPPORT_URL="http://help.ubuntu.com/" BUG_REPORT_URL="http://bugs.launchpad.net/ubuntu/" # uname -a Linux k8s-010 3.19.0-47-generic #53~14.04.1-Ubuntu SMP Mon Jan 18 16:09:14 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux You see, I

Access .NET Core app on Kubernetes on both http and https

我只是一个虾纸丫 提交于 2021-02-19 08:32:09
问题 Being new to Kubernetes, I am trying to make a simple .NET Core 3 MVC app run on Kubernetes and reply on port 443 as well as port 80. I have a working Docker-Compose setup which I am trying to port to Kubernetes. Running Docker Desktop CE with nginx-ingress on Win 10 Pro. So far it is working on port 80. (http://mymvc.local on host Win 10 - hosts file redirects mymvc.local to 127.0.0.1) My MVC app is running behind service mvc on port 5000. I've made a self-signed certificate for the domain

Kubernetes - “Mount Volume Failed” when trying to deploy

末鹿安然 提交于 2021-02-19 07:49:51
问题 I deployed my first container, I got info: deployment.apps/frontarena-ads-deployment created but then I saw my container creation is stuck in Waiting status. Then I saw the logs using kubectl describe pod frontarena-ads-deployment-5b475667dd-gzmlp and saw MountVolume error which I cannot figure out why it is thrown: Warning FailedMount 9m24s kubelet MountVolume.SetUp failed for volume "ads-filesharevolume" : mount failed: exit status 32 Mounting command: systemd-run Mounting arguments: -

3 Kubernetes clusters 1 base on local machine

这一生的挚爱 提交于 2021-02-19 07:43:32
问题 I would like to learn Kubernetes and would like to setup it on my laptop. The architecture would be as follows: Create 4 Ubuntu 18.04 server VM's instances on my laptop 3 of 4 VM's will be Kubernetes Clusters and 1 VM wilk be the base Access via SSH the base VM For virtualization, I am using Virtual Box. The question is, how to achieve it? 回答1: To set up Kubernetes Cluster on Ubuntu Servers with Virtualbox and Kubeadm follow this steps: Prerequisites: Virtual machines with specification of

Kubernetes: How to automatically clean up unused images

笑着哭i 提交于 2021-02-19 07:35:49
问题 Due to some internal issues, we need to remove unused images as soon as they become unused. I do know it's possible to use Garbage collection but it doesn't offer strict policy as we need. I've come across this solution but it's deprecated it also removes containers and possible mounted volumes I was thinking about setting a cron job directly over the nodes to run docker prune but I hope there is a better way No idea if it makes a difference but we are using AKS 回答1: This doesn't really

Kubernetes: How to automatically clean up unused images

谁说我不能喝 提交于 2021-02-19 07:35:22
问题 Due to some internal issues, we need to remove unused images as soon as they become unused. I do know it's possible to use Garbage collection but it doesn't offer strict policy as we need. I've come across this solution but it's deprecated it also removes containers and possible mounted volumes I was thinking about setting a cron job directly over the nodes to run docker prune but I hope there is a better way No idea if it makes a difference but we are using AKS 回答1: This doesn't really

Kubernetes NGINX Ingress changes HTTP request from a POST to a GET

╄→гoц情女王★ 提交于 2021-02-19 07:18:05
问题 I'm using Kubernetes that is bundled with Docker-for-Mac. I'm trying to configure an Ingress that routes http requests starting with /v1/ to my backend service and /ui/ requests to my Angular app. My issues seems to be that the HTTP method of the requests are changed by ingress (NGINX) from a POST to a GET. I have tried various rewrite rules, but to no avail. I even switched from Docker-for-Mac to Minikube, but the result is the same. If I use a simple ingress with no paths (just the default