<div id="cnblogs_post_body" class="blogpost-body "> <h1 class="entry-title">Prometheus 持久化安装</h1> <p>我们prometheus采用nfs挂载方式来存储数据,同时使用configMap管理配置文件。并且我们将所有的prometheus存储在<code>kube-system</code></p> <div class="cnblogs_code"><div class="cnblogs_code_toolbar"><span class="cnblogs_code_copy"><a href="javascript:void(0);" onclick="copyCnblogsCode(this)" title="复制代码"><img src="//common.cnblogs.com/images/copycode.gif" alt="复制代码"></a></span></div> <pre><span style="color: #000000;">#建议将所有的prometheus yaml文件存在一块 </span><span style="color: #0000ff;">mkdir</span> /opt/prometheus -p && cd /opt/<span style="color: #000000;">prometheus
#生成配置文件
</span><span style="color: #0000ff;">cat</span> >> prometheus.configmap.yaml <<<span style="color: #000000;">EOF apiVersion: v1 kind: ConfigMap metadata: name: prometheus</span>-<span style="color: #000000;">config namespace: kube</span>-<span style="color: #000000;">system data: prometheus.yml: </span>|<span style="color: #000000;"> global: scrape_interval: 15s scrape_timeout: 15s scrape_configs: </span>- job_name: <span style="color: #800000;">'</span><span style="color: #800000;">prometheus</span><span style="color: #800000;">'</span><span style="color: #000000;"> static_configs: </span>- targets: [<span style="color: #800000;">'</span><span style="color: #800000;">localhost:9090</span><span style="color: #800000;">'</span><span style="color: #000000;">] EOF
配置文件解释(这里的configmap实际上就是prometheus的配置)
上面包含了3个模块global、rule_files和scrape_configs
其中global模块控制Prometheus Server的全局配置 scrape_interval:表示prometheus抓取指标数据的频率,默认是15s,我们可以覆盖这个值 evaluation_interval:用来控制评估规则的频率,prometheus使用规则产生新的时间序列数据或者产生警报
rule_files模块制定了规则所在的位置,prometheus可以根据这个配置加载规则,用于生产新的时间序列数据或者报警信息,当前我们没有配置任何规则,后期会添加
scrape_configs用于控制prometheus监控哪些资源。由于prometheus通过http的方式来暴露它本身的监控数据,prometheus也能够监控本身的健康情况。在默认的配置有一个单独的job,叫做prometheus,它采集prometheus服务本身的时间序列数据。这个job包含了一个单独的、静态配置的目标;监听localhost上的9090端口。 prometheus默认会通过目标的</span>/metrics路径采集metrics。所以,默认的job通过URL:http:<span style="color: #008000;">//</span><span style="color: #008000;">localhost:9090/metrics采集metrics。收集到时间序列包含prometheus服务本身的状态和性能。如果我们还有其他的资源需要监控,可以直接配置在该模块下即可</span></pre>
<div class="cnblogs_code_toolbar"><span class="cnblogs_code_copy"><a href="javascript:void(0);" onclick="copyCnblogsCode(this)" title="复制代码"><img src="//common.cnblogs.com/images/copycode.gif" alt="复制代码"></a></span></div></div> <p>然后创建该资源对象:</p> <div class="cnblogs_code"> <pre>[root@k8s-<span style="color: #800080;">01</span> prometheus]# kubectl apply -<span style="color: #000000;">f prometheus.configmap.yaml configmap</span>/prometheus-config created</pre> </div> <div class="cnblogs_code"> <pre>[root@k8s-<span style="color: #800080;">01</span> prometheus]# kubectl get configmaps -n kube-system |<span style="color: #0000ff;">grep</span><span style="color: #000000;"> prometheus prometheus</span>-config <span style="color: #800080;">1</span> 163m</pre> </div> <p>配置文件创建完成,如果以后我们有新的资源需要被监控,我们只需要将<code>ConfigMap</code>对象更新即可,现在我们开始创建prometheus的Pod资源</p> <div class="cnblogs_code"><div class="cnblogs_code_toolbar"><span class="cnblogs_code_copy"><a href="javascript:void(0);" onclick="copyCnblogsCode(this)" title="复制代码"><img src="//common.cnblogs.com/images/copycode.gif" alt="复制代码"></a></span></div> <pre>[root@k8s-<span style="color: #800080;">01</span> prometheus]# <span style="color: #0000ff;">cat</span> > prometheus.deploy.yaml <<<span style="color: #000000;">EOF apiVersion: apps</span>/<span style="color: #000000;">v1 kind: Deployment metadata: name: prometheus namespace: kube</span>-<span style="color: #000000;">system labels: app: prometheus spec: selector: matchLabels: app: prometheus template: metadata: labels: app: prometheus spec: serviceAccountName: prometheus containers: </span>- image: prom/prometheus:v2.<span style="color: #800080;">4.3</span><span style="color: #000000;"> name: prometheus command: </span>- <span style="color: #800000;">"</span><span style="color: #800000;">/bin/prometheus</span><span style="color: #800000;">"</span><span style="color: #000000;"> args: </span>- <span style="color: #800000;">"</span><span style="color: #800000;">--config.file=/etc/prometheus/prometheus.yml</span><span style="color: #800000;">"</span> - <span style="color: #800000;">"</span><span style="color: #800000;">--storage.tsdb.path=/prometheus</span><span style="color: #800000;">"</span> - <span style="color: #800000;">"</span><span style="color: #800000;">--storage.tsdb.retention=30d</span><span style="color: #800000;">"</span> - <span style="color: #800000;">"</span><span style="color: #800000;">--web.enable-admin-api</span><span style="color: #800000;">"</span><span style="color: #000000;"> # 控制对admin HTTP API的访问,其中包括删除时间序列等功能 </span>- <span style="color: #800000;">"</span><span style="color: #800000;">--web.enable-lifecycle</span><span style="color: #800000;">"</span> # 支持热更新,直接执行localhost:<span style="color: #800080;">9090</span>/-/<span style="color: #000000;">reload立即生效 ports: </span>- containerPort: <span style="color: #800080;">9090</span><span style="color: #000000;"> protocol: TCP name: http volumeMounts: </span>- mountPath: <span style="color: #800000;">"</span><span style="color: #800000;">/prometheus</span><span style="color: #800000;">"</span><span style="color: #000000;"> subPath: prometheus name: data </span>- mountPath: <span style="color: #800000;">"</span><span style="color: #800000;">/etc/prometheus</span><span style="color: #800000;">"</span><span style="color: #000000;"> name: config</span>-<span style="color: #000000;">volume resources: requests: cpu: 100m memory: 512Mi limits: cpu: 100m memory: 512Mi securityContext: runAsUser: </span><span style="color: #800080;">0</span><span style="color: #000000;"> volumes: </span>-<span style="color: #000000;"> name: data persistentVolumeClaim: claimName: prometheus </span>-<span style="color: #000000;"> configMap: name: prometheus</span>-<span style="color: #000000;">config name: config</span>-<span style="color: #000000;">volume
</span>---<span style="color: #000000;"> apiVersion: v1 kind: Service metadata: namespace: kube</span>-<span style="color: #000000;">system name: prometheus labels: app: prometheus spec: type: NodePort selector: app: prometheus ports: </span>-<span style="color: #000000;"> name: http port: </span><span style="color: #800080;">9090</span><span style="color: #000000;">
EOF</span></pre>
<div class="cnblogs_code_toolbar"><span class="cnblogs_code_copy"><a href="javascript:void(0);" onclick="copyCnblogsCode(this)" title="复制代码"><img src="//common.cnblogs.com/images/copycode.gif" alt="复制代码"></a></span></div></div> <p>我们在启动程序的时候,除了指定<code>prometheus.yaml</code>(configmap)以外,还通过<code>storage.tsdb.path</code>指定了TSDB数据的存储路径、通过<code>storage.tsdb.rentention</code>设置了保留多长时间的数据,还有下面的web.enable-admin-api参数可以用来开启对admin api的访问权限,参数<code>web.enable-lifecyle</code>用来开启支持热更新,有了这个参数之后,<code>prometheus.yaml</code>(configmap)文件只要更新了,通过执行<code>localhost:9090/-/reload</code>就会立即生效</p> <p>我们添加了一行securityContext,,其中<code>runAsUser</code>设置为0,这是因为prometheus运行过程中使用的用户是nobody,如果不配置可能会出现权限问题</p> <p>NFS搭建步骤</p> <div class="cnblogs_code"><div class="cnblogs_code_toolbar"><span class="cnblogs_code_copy"><a href="javascript:void(0);" onclick="copyCnblogsCode(this)" title="复制代码"><img src="//common.cnblogs.com/images/copycode.gif" alt="复制代码"></a></span></div> <pre><span style="color: #0000ff;">for</span> i <span style="color: #0000ff;">in</span> k8s-<span style="color: #800080;">01</span> k8s-<span style="color: #800080;">02</span> k8s-<span style="color: #800080;">03</span>;<span style="color: #0000ff;">do</span> <span style="color: #0000ff;">ssh</span> root@$i <span style="color: #800000;">"</span><span style="color: #800000;">yum install nfs-utils rpcbind -y</span><span style="color: #800000;">"</span>;<span style="color: #0000ff;">done</span><span style="color: #000000;"> 接着我们在任意一台机器上搭建nfs,其他的服务器主要是挂载
我这里使用192.</span><span style="color: #800080;">168.0</span>.<span style="color: #800080;">200</span><span style="color: #000000;">
NFS服务器操作如下 </span><span style="color: #0000ff;">mkdir</span> -p /home/<span style="color: #000000;">kvm systemctl start rpcbind systemctl enable rpcbind systemctl enable nfs </span><span style="color: #0000ff;">echo</span> <span style="color: #800000;">"</span><span style="color: #800000;">/home/kvm *(rw,no_root_squash,sync)</span><span style="color: #800000;">"</span> >>/etc/<span style="color: #000000;">exports
其他k8s节点直接启动rpcbind并且挂载目录就可以 systemctl start rpcbind systemctl enable rpcbind </span><span style="color: #0000ff;">mkdir</span> /data/k8s -<span style="color: #000000;">p </span><span style="color: #0000ff;">mount</span> -t nfs <span style="color: #800080;">10.4</span>.<span style="color: #800080;">82.138</span>:/home/kvm /data/k8s</pre>
<div class="cnblogs_code_toolbar"><span class="cnblogs_code_copy"><a href="javascript:void(0);" onclick="copyCnblogsCode(this)" title="复制代码"><img src="//common.cnblogs.com/images/copycode.gif" alt="复制代码"></a></span></div></div> <p>prometheus.yaml文件对应的ConfigMap对象通过volume的形式挂载进Pod,这样ConfigMap更新后,对应的pod也会热更新,然后我们在执行上面的reload请求,prometheus配置就生效了。除此之外,对了将时间数据进行持久化,我们将数据目录和一个pvc对象进行了绑定,所以我们需要提前创建pvc对象</p> <div class="cnblogs_code"><div class="cnblogs_code_toolbar"><span class="cnblogs_code_copy"><a href="javascript:void(0);" onclick="copyCnblogsCode(this)" title="复制代码"><img src="//common.cnblogs.com/images/copycode.gif" alt="复制代码"></a></span></div> <pre>[root@k8s-<span style="color: #800080;">01</span> prometheus]# <span style="color: #0000ff;">cat</span> >prometheus-volume.yaml <<<span style="color: #000000;">EOF apiVersion: v1 kind: PersistentVolume metadata: name: prometheus spec: capacity: storage: 10Gi accessModes: </span>-<span style="color: #000000;"> ReadWriteOnce persistentVolumeReclaimPolicy: Recycle nfs: server: </span><span style="color: #800080;">192.168</span>.<span style="color: #800080;">0.200</span><span style="color: #000000;"> path: </span>/home/kvm/k8s-<span style="color: #000000;">vloume
</span>---<span style="color: #000000;"> apiVersion: v1 kind: PersistentVolumeClaim metadata: name: prometheus namespace: kube</span>-<span style="color: #000000;">system spec: accessModes: </span>-<span style="color: #000000;"> ReadWriteOnce resources: requests: storage: 10Gi
EOF
#nfs server nfs服务器ip path 挂载点,提前挂在好,确保可以写入</span></pre>
<div class="cnblogs_code_toolbar"><span class="cnblogs_code_copy"><a href="javascript:void(0);" onclick="copyCnblogsCode(this)" title="复制代码"><img src="//common.cnblogs.com/images/copycode.gif" alt="复制代码"></a></span></div></div> <p>这里通过一个简单的NFS作为存储后端创建一个pv & pvc</p> <div class="cnblogs_code"> <pre>[root@k8s-<span style="color: #800080;">01</span> prometheus]# kubectl create -f prometheus-<span style="color: #000000;">volume.yaml persistentvolume</span>/<span style="color: #000000;">prometheus created persistentvolumeclaim</span>/prometheus created</pre> </div> <p>我们这里还需要创建rbac认证,因为prometheus需要访问k8s集群内部的资源</p> <div class="cnblogs_code"><div class="cnblogs_code_toolbar"><span class="cnblogs_code_copy"><a href="javascript:void(0);" onclick="copyCnblogsCode(this)" title="复制代码"><img src="//common.cnblogs.com/images/copycode.gif" alt="复制代码"></a></span></div> <pre><span style="color: #0000ff;">cat</span> >>prometheus-rbac.yaml <<<span style="color: #000000;">EOF apiVersion: v1 kind: ServiceAccount metadata: name: prometheus namespace: kube</span>-<span style="color: #000000;">system </span>---<span style="color: #000000;"> apiVersion: rbac.authorization.k8s.io</span>/<span style="color: #000000;">v1 kind: ClusterRole metadata: name: prometheus rules: </span>-<span style="color: #000000;"> apiGroups: </span>- <span style="color: #800000;">""</span><span style="color: #000000;"> resources: </span>-<span style="color: #000000;"> nodes </span>-<span style="color: #000000;"> services </span>-<span style="color: #000000;"> endpoints </span>-<span style="color: #000000;"> pods </span>- nodes/<span style="color: #000000;">proxy verbs: </span>-<span style="color: #000000;"> get </span>-<span style="color: #000000;"> list </span>-<span style="color: #000000;"> watch </span>-<span style="color: #000000;"> apiGroups: </span>- <span style="color: #800000;">""</span><span style="color: #000000;"> resources: </span>-<span style="color: #000000;"> configmaps </span>- nodes/<span style="color: #000000;">metrics verbs: </span>-<span style="color: #000000;"> get </span>-<span style="color: #000000;"> nonResourceURLs: </span>- /<span style="color: #000000;">metrics verbs: </span>-<span style="color: #000000;"> get </span>---<span style="color: #000000;"> apiVersion: rbac.authorization.k8s.io</span>/<span style="color: #000000;">v1beta1 kind: ClusterRoleBinding metadata: name: prometheus roleRef: apiGroup: rbac.authorization.k8s.io kind: ClusterRole name: prometheus subjects: </span>-<span style="color: #000000;"> kind: ServiceAccount name: prometheus namespace: kube</span>-<span style="color: #000000;">system EOF</span></pre> <div class="cnblogs_code_toolbar"><span class="cnblogs_code_copy"><a href="javascript:void(0);" onclick="copyCnblogsCode(this)" title="复制代码"><img src="//common.cnblogs.com/images/copycode.gif" alt="复制代码"></a></span></div></div> <p>由于我们要获取的资源,在每一个<code>namespace</code>下面都有可能存在,所以我们这里使用的是ClusterRole的资源对象,nonResourceURLs是用来对非资源型metrics进行操作的权限声明</p> <div class="cnblogs_code"> <pre><span style="color: #000000;">创建rbac文件 [root@k8s</span>-01 prometheus]# kubectl create -f prometheus-<span style="color: #000000;">rbac.yaml serviceaccount</span>/<span style="color: #000000;">prometheus created clusterrole.rbac.authorization.k8s.io</span>/<span style="color: #000000;">prometheus created clusterrolebinding.rbac.authorization.k8s.io</span>/prometheus created</pre> </div> <p>我们将<code>ConfigMap</code> <code>volume</code> <code>rbac</code> 创建完毕后,就可以创建prometheus.deploy.yaml了,运行prometheus服务</p> <div class="cnblogs_code"><div class="cnblogs_code_toolbar"><span class="cnblogs_code_copy"><a href="javascript:void(0);" onclick="copyCnblogsCode(this)" title="复制代码"><img src="//common.cnblogs.com/images/copycode.gif" alt="复制代码"></a></span></div> <pre>[root@k8s-<span style="color: #800080;">01</span>]# kubectl create -<span style="color: #000000;">f prometheus.deploy.yaml deployment.extensions</span>/<span style="color: #000000;">prometheus created
[root@k8s</span>-<span style="color: #800080;">01</span> prometheus]# kubectl get pod -n kube-system |<span style="color: #0000ff;">grep</span><span style="color: #000000;"> prometheus prometheus</span>-847494df74-zbz9v <span style="color: #800080;">1</span>/<span style="color: #800080;">1</span> Running <span style="color: #800080;">0</span><span style="color: #000000;"> 148m
#这里1</span>/<span style="color: #800080;">1</span> 状态为Running即可</pre>
<div class="cnblogs_code_toolbar"><span class="cnblogs_code_copy"><a href="javascript:void(0);" onclick="copyCnblogsCode(this)" title="复制代码"><img src="//common.cnblogs.com/images/copycode.gif" alt="复制代码"></a></span></div></div> <p>现在我们prometheus服务状态是已经正常了,但是我们在浏览器是无法访问prometheus的 webui服务。那么我们还需要创建一个service</p> <div class="cnblogs_code"><div class="cnblogs_code_toolbar"><span class="cnblogs_code_copy"><a href="javascript:void(0);" onclick="copyCnblogsCode(this)" title="复制代码"><img src="//common.cnblogs.com/images/copycode.gif" alt="复制代码"></a></span></div> <pre><span style="color: #0000ff;">cat</span> >>prometeheus-svc.yaml <<<span style="color: #000000;">EOF apiVersion: v1 kind: Service metadata: namespace: kube</span>-<span style="color: #000000;">system name: prometheus labels: app: prometheus spec: type: NodePort selector: app: prometheus ports: </span>-<span style="color: #000000;"> name: http port: </span><span style="color: #800080;">9090</span><span style="color: #000000;">
EOF</span></pre>
<div class="cnblogs_code_toolbar"><span class="cnblogs_code_copy"><a href="javascript:void(0);" onclick="copyCnblogsCode(this)" title="复制代码"><img src="//common.cnblogs.com/images/copycode.gif" alt="复制代码"></a></span></div></div> <div class="cnblogs_code"> <pre>[root@k8s-01prometheus]# kubectl create -f prometeheus-<span style="color: #000000;">svc.yaml service</span>/<span style="color: #000000;">prometheus created
[root@k8s</span>-<span style="color: #800080;">01</span> prometheus]# kubectl get svc -n kube-system |<span style="color: #0000ff;">grep</span><span style="color: #000000;"> prometheus prometheus NodePort </span><span style="color: #800080;">10.1</span>.<span style="color: #800080;">183.250</span> <none> <span style="color: #800080;">9090</span>:<span style="color: #800080;">30129</span>/TCP 148m</pre>
</div> <p>这里定义的端口为3xxxx,我们直接在浏览器上任意节点输入ip+端口即可</p> <p><img src="https://img2018.cnblogs.com/blog/981215/201912/981215-20191205141604318-523854267.png" alt=""></p> <p> </p> <p> </p> <p> </p> <p><img src="https://img2018.cnblogs.com/blog/981215/201912/981215-20191205142216377-18202296.png" alt=""></p> <p> </p> <p> </p> <p> </p> <p>我们可以查看一下当前监控规则</p> <blockquote> <p>默认prometheus会监控自己</p> </blockquote> <p><code>Status-->Targets</code></p> <p><img src="https://img2018.cnblogs.com/blog/981215/201912/981215-20191205142250774-650875121.png" alt=""></p> <p> </p> <p> </p> <p> 我们查看一下数据,是否收集到数据</p> <p><img src="https://img2018.cnblogs.com/blog/981215/201912/981215-20191205142343107-722214758.png" alt=""></p> <p> </p> <p> </p> <p> </p> <p> </p> <h1 class="entry-title">Prometheus监控Kubernetes 集群节点及应用</h1> <p> </p> <p>对于Kubernetes的集群监控一般我们需要考虑一下几方面</p> <ul> <li>Kubernetes节点的监控;比如节点的cpu、load、fdisk、memory等指标</li> <li>内部系统组件的状态;比如kube-scheduler、kube-controller-manager、kubedns/coredns等组件的运行状态</li> <li>编排级的metrics;比如Deployment的状态、资源请求、调度和API延迟等数据指标</li> </ul> <h2>监控方案</h2> <p>Kubernetes集群的监控方案主要有以下几种方案</p> <ul> <li>Heapster:Herapster是一个集群范围的监控和数据聚合工具,以Pod的形式运行在集群中</li> </ul> <p><img src="http://static.zybuluo.com/abcdocker/a4un82s62oe31bd8kjveugxb/kubernetes_monitoring_heapster.png" alt="kubernetes_monitoring_heapster.png-19.1kB"></p> <p>Kubelet/cAdvisor之外,我们还可以向Heapster添加其他指标源数据,比如kube-state-metrics</p> <blockquote> <p>Heapster已经被废弃,使用metrics-server代替</p> </blockquote> <ul> <li>cAvisor:<a href="https://github.com/google/cadvisor" target="_blank">cAdvisor</a>是Google开源的容器资源监控和性能分析工具,它是专门为容器而生,本身也支持Docker容器,Kubernetes中,我们不需要单独去安装,cAdvisor作为kubelet内置的一部分程序可以直接使用</li> <li><a href="https://github.com/kubernetes/kube-state-metrics" target="_blank">Kube-state-metrics</a>:通过监听API Server生成有关资源对象的状态指标,比如Deployment、Node、Pod,需要注意的是kube-state-metrics只是简单的提供一个metrics数据,并不会存储这些指标数据,所以我们可以使用Prometheus来抓取这些数据然后存储</li> <li>metrics-server:metrics-server也是一个集群范围内的资源数据局和工具,是Heapster的代替品,同样的,metrics-server也只是显示数据,并不提供数据存储服务。</li> </ul> <p>不过<code>kube-state-metrics</code>和<code>metrics-server</code>之前还有很大不同的,二者主要区别如下</p> <pre class="ql-syntax">1.kube-state-metrics主要关注的是业务相关的一些元数据,比如Deployment、Pod、副本状态等 2.metrics-service主要关注的是资源度量API的实现,比如CPU、文件描述符、内存、请求延时等指标 </pre> <p><a href="https://github.com/kubernetes/community/blob/master/contributors/design-proposals/instrumentation/resource-metrics-api.md" target="_blank">资源度量API</a></p> <hr> <h2>监控集群节点</h2> <p>首先需要我们监控集群的节点,要监控节点其实我们已经有很多非常成熟的方案了,比如Nagios、Zabbix,甚至可以我们自己收集数据,这里我们通过prometheus来采集节点的监控指标,可以通过node_exporter获取,node_exporter就是抓取用于采集服务器节点的各种运行指标,目前node_exporter几乎支持所有常见的监控点,比如cpu、distats、loadavg、meminfo、netstat等,详细的监控列表可以参考<a href="https://github.com/prometheus/node_exporter" target="_blank">github repo</a></p> <p>这里使用<code>DeamonSet</code>控制器来部署该服务,这样每一个节点都会运行一个Pod,如果我们从集群中删除或添加节点后,也会进行自动扩展</p> <div class="cnblogs_code"><div class="cnblogs_code_toolbar"><span class="cnblogs_code_copy"><a href="javascript:void(0);" onclick="copyCnblogsCode(this)" title="复制代码"><img src="//common.cnblogs.com/images/copycode.gif" alt="复制代码"></a></span></div> <pre>[root@k8s-<span style="color: #800080;">01</span> prometheus]# <span style="color: #0000ff;">cat</span> >>prometheus-node-exporter.yaml<<<span style="color: #000000;">EOF apiVersion: apps</span>/<span style="color: #000000;">v1 kind: DaemonSet metadata: name: node</span>-<span style="color: #000000;">exporter namespace: kube</span>-<span style="color: #000000;">system labels: name: node</span>-<span style="color: #000000;">exporter k8s</span>-app: node-<span style="color: #000000;">exporter spec: selector: matchLabels: name: node</span>-<span style="color: #000000;">exporter template: metadata: labels: name: node</span>-<span style="color: #000000;">exporter app: node</span>-<span style="color: #000000;">exporter spec: hostPID: </span><span style="color: #0000ff;">true</span><span style="color: #000000;"> hostIPC: </span><span style="color: #0000ff;">true</span><span style="color: #000000;"> hostNetwork: </span><span style="color: #0000ff;">true</span><span style="color: #000000;"> containers: </span>- name: node-<span style="color: #000000;">exporter image: prom</span>/node-exporter:v0.<span style="color: #800080;">16.0</span><span style="color: #000000;"> ports: </span>- containerPort: <span style="color: #800080;">9100</span><span style="color: #000000;"> resources: requests: cpu: </span><span style="color: #800080;">0.15</span><span style="color: #000000;"> securityContext: privileged: </span><span style="color: #0000ff;">true</span><span style="color: #000000;"> args: </span>- --<span style="color: #000000;">path.procfs </span>- /host/<span style="color: #000000;">proc </span>- --<span style="color: #000000;">path.sysfs </span>- /host/<span style="color: #000000;">sys </span>- --collector.filesystem.ignored-<span style="color: #0000ff;">mount</span>-<span style="color: #000000;">points </span>- <span style="color: #800000;">'</span><span style="color: #800000;">"^/(sys|proc|dev|host|etc)($|/)"</span><span style="color: #800000;">'</span><span style="color: #000000;"> volumeMounts: </span>-<span style="color: #000000;"> name: dev mountPath: </span>/host/<span style="color: #000000;">dev </span>-<span style="color: #000000;"> name: proc mountPath: </span>/host/<span style="color: #000000;">proc </span>-<span style="color: #000000;"> name: sys mountPath: </span>/host/<span style="color: #000000;">sys </span>-<span style="color: #000000;"> name: rootfs mountPath: </span>/<span style="color: #000000;">rootfs tolerations: </span>- key: <span style="color: #800000;">"</span><span style="color: #800000;">node-role.kubernetes.io/master</span><span style="color: #800000;">"</span><span style="color: #000000;"> operator: </span><span style="color: #800000;">"</span><span style="color: #800000;">Exists</span><span style="color: #800000;">"</span><span style="color: #000000;"> effect: </span><span style="color: #800000;">"</span><span style="color: #800000;">NoSchedule</span><span style="color: #800000;">"</span><span style="color: #000000;"> volumes: </span>-<span style="color: #000000;"> name: proc hostPath: path: </span>/<span style="color: #000000;">proc </span>-<span style="color: #000000;"> name: dev hostPath: path: </span>/<span style="color: #000000;">dev </span>-<span style="color: #000000;"> name: sys hostPath: path: </span>/<span style="color: #000000;">sys </span>-<span style="color: #000000;"> name: rootfs hostPath: path: </span>/<span style="color: #000000;">
EOF</span></pre>
<div class="cnblogs_code_toolbar"><span class="cnblogs_code_copy"><a href="javascript:void(0);" onclick="copyCnblogsCode(this)" title="复制代码"><img src="//common.cnblogs.com/images/copycode.gif" alt="复制代码"></a></span></div></div> <p>创建node-exporter并检查pod</p> <div class="cnblogs_code"><div class="cnblogs_code_toolbar"><span class="cnblogs_code_copy"><a href="javascript:void(0);" onclick="copyCnblogsCode(this)" title="复制代码"><img src="//common.cnblogs.com/images/copycode.gif" alt="复制代码"></a></span></div> <pre>[root@k8s-01prometheus]# kubectl create -f prometheus-node-<span style="color: #000000;">exporter.yaml daemonset.extensions</span>/node-<span style="color: #000000;">exporter created
[root@k8s</span>-<span style="color: #800080;">01</span> prometheus]# kubectl get pod -n kube-system -o wide|<span style="color: #0000ff;">grep</span><span style="color: #000000;"> node node</span>-exporter-bsdkl <span style="color: #800080;">1</span>/<span style="color: #800080;">1</span> Running <span style="color: #800080;">0</span> 36m <span style="color: #800080;">192.168</span>.<span style="color: #800080;">122.217</span> k8s-<span style="color: #800080;">02</span> <none> <none><span style="color: #000000;"> node</span>-exporter-f8wrt <span style="color: #800080;">1</span>/<span style="color: #800080;">1</span> Running <span style="color: #800080;">0</span> 36m <span style="color: #800080;">192.168</span>.<span style="color: #800080;">122.2</span> k8s-<span style="color: #800080;">01</span> <none> <none><span style="color: #000000;"> node</span>-exporter-gjhvz <span style="color: #800080;">1</span>/<span style="color: #800080;">1</span> Running <span style="color: #800080;">0</span> 36m <span style="color: #800080;">192.168</span>.<span style="color: #800080;">122.165</span> k8s-<span style="color: #800080;">03</span> <none> <none><span style="color: #000000;">
#这里我们可以看到,我们有3个节点,在所有的节点上都启动了一个对应Pod进行获取数据</span></pre>
<div class="cnblogs_code_toolbar"><span class="cnblogs_code_copy"><a href="javascript:void(0);" onclick="copyCnblogsCode(this)" title="复制代码"><img src="//common.cnblogs.com/images/copycode.gif" alt="复制代码"></a></span></div></div> <p>node-exporter.yaml文件说明</p> <p>由于我们要获取的数据是主机的监控指标数据,而我们的node-exporter是运行在容器中的,所以我们在Pod中需要配置一些Pod的安全策略</p> <div class="cnblogs_code"> <pre>hostPID:<span style="color: #0000ff;">true</span><span style="color: #000000;"> hostIPC:</span><span style="color: #0000ff;">true</span><span style="color: #000000;"> hostNetwork:</span><span style="color: #0000ff;">true</span><span style="color: #000000;">
#这三个配置主要用于主机的PID namespace、IPC namespace以及主机网络,这里需要注意的是namespace是用于容器隔离的关键技术,这里的namespace和集群中的namespace是两个完全不同的概念</span></pre>
</div> <p>另外我们还需要将主机<code>/dev</code>、<code>/proc</code>、<code>/sys</code>这些目录挂在到容器中,这些因为我们采集的很多节点数据都是通过这些文件来获取系统信息</p> <blockquote> <p>比如我们在执行top命令可以查看当前cpu使用情况,数据就来源于/proc/stat,使用free命令可以查看当前内存使用情况,其数据来源是/proc/meminfo文件</p> </blockquote> <p>另外如果是使用<code>kubeadm</code>搭建的,同时需要监控master节点的,则需要添加下方的相应容忍</p> <pre class="ql-syntax"> - key: "node-role.kubernetes.io/master" operator: "Exists" effect: "NoSchedule </pre> <p>node-exporter容器相关启动参数</p> <pre class="ql-syntax"> args: - --path.procfs #配置挂载宿主机(node节点)的路径 - /host/proc - --path.sysfs #配置挂载宿主机(node节点)的路径 - /host/sys - --collector.filesystem.ignored-mount-points - '"^/(sys|proc|dev|host|etc)($|/)"' </pre> <p>在我们的yaml文件中加入了<code>hostNetwork:true</code>会直接将我们的宿主机的9100端口映射出来,从而不需要创建service 在我们的宿主机上就会有一个9100的端口</p> <p><code>容器的9100--->映射到宿主机9100</code></p> <pre class="ql-syntax"> hostNetwork: true containers: - name: node-exporter image: prom/node-exporter:v0.16.0 ports: - containerPort: 9100 </pre> <p>上面我们检查了Pod的运行状态都是正常的,接下来我们要查看一下Pod日志,以及node-exporter中的metrics</p> <p>使用命令<code>kubectl logs -n 命名空间 node-exporter中Pod名称</code>检查Pod日志是否有额外报错</p> <div class="cnblogs_code"><div class="cnblogs_code_toolbar"><span class="cnblogs_code_copy"><a href="javascript:void(0);" onclick="copyCnblogsCode(this)" title="复制代码"><img src="//common.cnblogs.com/images/copycode.gif" alt="复制代码"></a></span></div> <pre>[root@k8s-<span style="color: #800080;">01</span> prometheus]# kubectl logs -n kube-system node-exporter-<span style="color: #000000;"> node</span>-exporter-bsdkl node-exporter-f8wrt node-exporter-<span style="color: #000000;">gjhvz [root@k8s</span>-<span style="color: #800080;">01</span> prometheus]# kubectl logs -n kube-system node-exporter-<span style="color: #000000;">bsdkl </span><span style="color: #0000ff;">time</span>=<span style="color: #800000;">"</span><span style="color: #800000;">2019-12-05T05:50:42Z</span><span style="color: #800000;">"</span> level=<span style="color: #0000ff;">info</span> msg=<span style="color: #800000;">"</span><span style="color: #800000;">Starting node_exporter (version=0.16.0, branch=HEAD, revision=d42bd70f4363dced6b77d8fc311ea57b63387e4f)</span><span style="color: #800000;">"</span> source=<span style="color: #800000;">"</span><span style="color: #800000;">node_exporter.go:82</span><span style="color: #800000;">"</span> <span style="color: #0000ff;">time</span>=<span style="color: #800000;">"</span><span style="color: #800000;">2019-12-05T05:50:42Z</span><span style="color: #800000;">"</span> level=<span style="color: #0000ff;">info</span> msg=<span style="color: #800000;">"</span><span style="color: #800000;">Build context (go=go1.9.6, user=root@a67a9bc13a69, date=20180515-15:52:42)</span><span style="color: #800000;">"</span> source=<span style="color: #800000;">"</span><span style="color: #800000;">node_exporter.go:83</span><span style="color: #800000;">"</span> <span style="color: #0000ff;">time</span>=<span style="color: #800000;">"</span><span style="color: #800000;">2019-12-05T05:50:42Z</span><span style="color: #800000;">"</span> level=<span style="color: #0000ff;">info</span> msg=<span style="color: #800000;">"</span><span style="color: #800000;">Enabled collectors:</span><span style="color: #800000;">"</span> source=<span style="color: #800000;">"</span><span style="color: #800000;">node_exporter.go:90</span><span style="color: #800000;">"</span> <span style="color: #0000ff;">time</span>=<span style="color: #800000;">"</span><span style="color: #800000;">2019-12-05T05:50:42Z</span><span style="color: #800000;">"</span> level=<span style="color: #0000ff;">info</span> msg=<span style="color: #800000;">"</span><span style="color: #800000;"> - arp</span><span style="color: #800000;">"</span> source=<span style="color: #800000;">"</span><span style="color: #800000;">node_exporter.go:97</span><span style="color: #800000;">"</span> <span style="color: #0000ff;">time</span>=<span style="color: #800000;">"</span><span style="color: #800000;">2019-12-05T05:50:42Z</span><span style="color: #800000;">"</span> level=<span style="color: #0000ff;">info</span> msg=<span style="color: #800000;">"</span><span style="color: #800000;"> - bcache</span><span style="color: #800000;">"</span> source=<span style="color: #800000;">"</span><span style="color: #800000;">node_exporter.go:97</span><span style="color: #800000;">"</span> <span style="color: #0000ff;">time</span>=<span style="color: #800000;">"</span><span style="color: #800000;">2019-12-05T05:50:42Z</span><span style="color: #800000;">"</span> level=<span style="color: #0000ff;">info</span> msg=<span style="color: #800000;">"</span><span style="color: #800000;"> - bonding</span><span style="color: #800000;">"</span> source=<span style="color: #800000;">"</span><span style="color: #800000;">node_exporter.go:97</span><span style="color: #800000;">"</span> <span style="color: #0000ff;">time</span>=<span style="color: #800000;">"</span><span style="color: #800000;">2019-12-05T05:50:42Z</span><span style="color: #800000;">"</span> level=<span style="color: #0000ff;">info</span> msg=<span style="color: #800000;">"</span><span style="color: #800000;"> - conntrack</span><span style="color: #800000;">"</span> source=<span style="color: #800000;">"</span><span style="color: #800000;">node_exporter.go:97</span><span style="color: #800000;">"</span> <span style="color: #0000ff;">time</span>=<span style="color: #800000;">"</span><span style="color: #800000;">2019-12-05T05:50:42Z</span><span style="color: #800000;">"</span> level=<span style="color: #0000ff;">info</span> msg=<span style="color: #800000;">"</span><span style="color: #800000;"> - cpu</span><span style="color: #800000;">"</span> source=<span style="color: #800000;">"</span><span style="color: #800000;">node_exporter.go:97</span><span style="color: #800000;">"</span> <span style="color: #0000ff;">time</span>=<span style="color: #800000;">"</span><span style="color: #800000;">2019-12-05T05:50:42Z</span><span style="color: #800000;">"</span> level=<span style="color: #0000ff;">info</span> msg=<span style="color: #800000;">"</span><span style="color: #800000;"> - diskstats</span><span style="color: #800000;">"</span> source=<span style="color: #800000;">"</span><span style="color: #800000;">node_exporter.go:97</span><span style="color: #800000;">"</span> <span style="color: #0000ff;">time</span>=<span style="color: #800000;">"</span><span style="color: #800000;">2019-12-05T05:50:42Z</span><span style="color: #800000;">"</span> level=<span style="color: #0000ff;">info</span> msg=<span style="color: #800000;">"</span><span style="color: #800000;"> - edac</span><span style="color: #800000;">"</span> source=<span style="color: #800000;">"</span><span style="color: #800000;">node_exporter.go:97</span><span style="color: #800000;">"</span> <span style="color: #0000ff;">time</span>=<span style="color: #800000;">"</span><span style="color: #800000;">2019-12-05T05:50:42Z</span><span style="color: #800000;">"</span> level=<span style="color: #0000ff;">info</span> msg=<span style="color: #800000;">"</span><span style="color: #800000;"> - entropy</span><span style="color: #800000;">"</span> source=<span style="color: #800000;">"</span><span style="color: #800000;">node_exporter.go:97</span><span style="color: #800000;">"</span> <span style="color: #0000ff;">time</span>=<span style="color: #800000;">"</span><span style="color: #800000;">2019-12-05T05:50:42Z</span><span style="color: #800000;">"</span> level=<span style="color: #0000ff;">info</span> msg=<span style="color: #800000;">"</span><span style="color: #800000;"> - filefd</span><span style="color: #800000;">"</span> source=<span style="color: #800000;">"</span><span style="color: #800000;">node_exporter.go:97</span><span style="color: #800000;">"</span> <span style="color: #0000ff;">time</span>=<span style="color: #800000;">"</span><span style="color: #800000;">2019-12-05T05:50:42Z</span><span style="color: #800000;">"</span> level=<span style="color: #0000ff;">info</span> msg=<span style="color: #800000;">"</span><span style="color: #800000;"> - filesystem</span><span style="color: #800000;">"</span> source=<span style="color: #800000;">"</span><span style="color: #800000;">node_exporter.go:97</span><span style="color: #800000;">"</span> <span style="color: #0000ff;">time</span>=<span style="color: #800000;">"</span><span style="color: #800000;">2019-12-05T05:50:42Z</span><span style="color: #800000;">"</span> level=<span style="color: #0000ff;">info</span> msg=<span style="color: #800000;">"</span><span style="color: #800000;"> - hwmon</span><span style="color: #800000;">"</span> source=<span style="color: #800000;">"</span><span style="color: #800000;">node_exporter.go:97</span><span style="color: #800000;">"</span> <span style="color: #0000ff;">time</span>=<span style="color: #800000;">"</span><span style="color: #800000;">2019-12-05T05:50:42Z</span><span style="color: #800000;">"</span> level=<span style="color: #0000ff;">info</span> msg=<span style="color: #800000;">"</span><span style="color: #800000;"> - infiniband</span><span style="color: #800000;">"</span> source=<span style="color: #800000;">"</span><span style="color: #800000;">node_exporter.go:97</span><span style="color: #800000;">"</span> <span style="color: #0000ff;">time</span>=<span style="color: #800000;">"</span><span style="color: #800000;">2019-12-05T05:50:42Z</span><span style="color: #800000;">"</span> level=<span style="color: #0000ff;">info</span> msg=<span style="color: #800000;">"</span><span style="color: #800000;"> - ipvs</span><span style="color: #800000;">"</span> source=<span style="color: #800000;">"</span><span style="color: #800000;">node_exporter.go:97</span><span style="color: #800000;">"</span> <span style="color: #0000ff;">time</span>=<span style="color: #800000;">"</span><span style="color: #800000;">2019-12-05T05:50:42Z</span><span style="color: #800000;">"</span> level=<span style="color: #0000ff;">info</span> msg=<span style="color: #800000;">"</span><span style="color: #800000;"> - loadavg</span><span style="color: #800000;">"</span> source=<span style="color: #800000;">"</span><span style="color: #800000;">node_exporter.go:97</span><span style="color: #800000;">"</span> <span style="color: #0000ff;">time</span>=<span style="color: #800000;">"</span><span style="color: #800000;">2019-12-05T05:50:42Z</span><span style="color: #800000;">"</span> level=<span style="color: #0000ff;">info</span> msg=<span style="color: #800000;">"</span><span style="color: #800000;"> - mdadm</span><span style="color: #800000;">"</span> source=<span style="color: #800000;">"</span><span style="color: #800000;">node_exporter.go:97</span><span style="color: #800000;">"</span> <span style="color: #0000ff;">time</span>=<span style="color: #800000;">"</span><span style="color: #800000;">2019-12-05T05:50:42Z</span><span style="color: #800000;">"</span> level=<span style="color: #0000ff;">info</span> msg=<span style="color: #800000;">"</span><span style="color: #800000;"> - meminfo</span><span style="color: #800000;">"</span> source=<span style="color: #800000;">"</span><span style="color: #800000;">node_exporter.go:97</span><span style="color: #800000;">"</span> <span style="color: #0000ff;">time</span>=<span style="color: #800000;">"</span><span style="color: #800000;">2019-12-05T05:50:42Z</span><span style="color: #800000;">"</span> level=<span style="color: #0000ff;">info</span> msg=<span style="color: #800000;">"</span><span style="color: #800000;"> - netdev</span><span style="color: #800000;">"</span> source=<span style="color: #800000;">"</span><span style="color: #800000;">node_exporter.go:97</span><span style="color: #800000;">"</span> <span style="color: #0000ff;">time</span>=<span style="color: #800000;">"</span><span style="color: #800000;">2019-12-05T05:50:42Z</span><span style="color: #800000;">"</span> level=<span style="color: #0000ff;">info</span> msg=<span style="color: #800000;">"</span><span style="color: #800000;"> - netstat</span><span style="color: #800000;">"</span> source=<span style="color: #800000;">"</span><span style="color: #800000;">node_exporter.go:97</span><span style="color: #800000;">"</span> <span style="color: #0000ff;">time</span>=<span style="color: #800000;">"</span><span style="color: #800000;">2019-12-05T05:50:42Z</span><span style="color: #800000;">"</span> level=<span style="color: #0000ff;">info</span> msg=<span style="color: #800000;">"</span><span style="color: #800000;"> - nfs</span><span style="color: #800000;">"</span> source=<span style="color: #800000;">"</span><span style="color: #800000;">node_exporter.go:97</span><span style="color: #800000;">"</span> <span style="color: #0000ff;">time</span>=<span style="color: #800000;">"</span><span style="color: #800000;">2019-12-05T05:50:42Z</span><span style="color: #800000;">"</span> level=<span style="color: #0000ff;">info</span> msg=<span style="color: #800000;">"</span><span style="color: #800000;"> - nfsd</span><span style="color: #800000;">"</span> source=<span style="color: #800000;">"</span><span style="color: #800000;">node_exporter.go:97</span><span style="color: #800000;">"</span> <span style="color: #0000ff;">time</span>=<span style="color: #800000;">"</span><span style="color: #800000;">2019-12-05T05:50:42Z</span><span style="color: #800000;">"</span> level=<span style="color: #0000ff;">info</span> msg=<span style="color: #800000;">"</span><span style="color: #800000;"> - sockstat</span><span style="color: #800000;">"</span> source=<span style="color: #800000;">"</span><span style="color: #800000;">node_exporter.go:97</span><span style="color: #800000;">"</span> <span style="color: #0000ff;">time</span>=<span style="color: #800000;">"</span><span style="color: #800000;">2019-12-05T05:50:42Z</span><span style="color: #800000;">"</span> level=<span style="color: #0000ff;">info</span> msg=<span style="color: #800000;">"</span><span style="color: #800000;"> - stat</span><span style="color: #800000;">"</span> source=<span style="color: #800000;">"</span><span style="color: #800000;">node_exporter.go:97</span><span style="color: #800000;">"</span> <span style="color: #0000ff;">time</span>=<span style="color: #800000;">"</span><span style="color: #800000;">2019-12-05T05:50:42Z</span><span style="color: #800000;">"</span> level=<span style="color: #0000ff;">info</span> msg=<span style="color: #800000;">"</span><span style="color: #800000;"> - textfile</span><span style="color: #800000;">"</span> source=<span style="color: #800000;">"</span><span style="color: #800000;">node_exporter.go:97</span><span style="color: #800000;">"</span> <span style="color: #0000ff;">time</span>=<span style="color: #800000;">"</span><span style="color: #800000;">2019-12-05T05:50:42Z</span><span style="color: #800000;">"</span> level=<span style="color: #0000ff;">info</span> msg=<span style="color: #800000;">"</span><span style="color: #800000;"> - time</span><span style="color: #800000;">"</span> source=<span style="color: #800000;">"</span><span style="color: #800000;">node_exporter.go:97</span><span style="color: #800000;">"</span> <span style="color: #0000ff;">time</span>=<span style="color: #800000;">"</span><span style="color: #800000;">2019-12-05T05:50:42Z</span><span style="color: #800000;">"</span> level=<span style="color: #0000ff;">info</span> msg=<span style="color: #800000;">"</span><span style="color: #800000;"> - timex</span><span style="color: #800000;">"</span> source=<span style="color: #800000;">"</span><span style="color: #800000;">node_exporter.go:97</span><span style="color: #800000;">"</span> <span style="color: #0000ff;">time</span>=<span style="color: #800000;">"</span><span style="color: #800000;">2019-12-05T05:50:42Z</span><span style="color: #800000;">"</span> level=<span style="color: #0000ff;">info</span> msg=<span style="color: #800000;">"</span><span style="color: #800000;"> - uname</span><span style="color: #800000;">"</span> source=<span style="color: #800000;">"</span><span style="color: #800000;">node_exporter.go:97</span><span style="color: #800000;">"</span> <span style="color: #0000ff;">time</span>=<span style="color: #800000;">"</span><span style="color: #800000;">2019-12-05T05:50:42Z</span><span style="color: #800000;">"</span> level=<span style="color: #0000ff;">info</span> msg=<span style="color: #800000;">"</span><span style="color: #800000;"> - vmstat</span><span style="color: #800000;">"</span> source=<span style="color: #800000;">"</span><span style="color: #800000;">node_exporter.go:97</span><span style="color: #800000;">"</span> <span style="color: #0000ff;">time</span>=<span style="color: #800000;">"</span><span style="color: #800000;">2019-12-05T05:50:42Z</span><span style="color: #800000;">"</span> level=<span style="color: #0000ff;">info</span> msg=<span style="color: #800000;">"</span><span style="color: #800000;"> - wifi</span><span style="color: #800000;">"</span> source=<span style="color: #800000;">"</span><span style="color: #800000;">node_exporter.go:97</span><span style="color: #800000;">"</span> <span style="color: #0000ff;">time</span>=<span style="color: #800000;">"</span><span style="color: #800000;">2019-12-05T05:50:42Z</span><span style="color: #800000;">"</span> level=<span style="color: #0000ff;">info</span> msg=<span style="color: #800000;">"</span><span style="color: #800000;"> - xfs</span><span style="color: #800000;">"</span> source=<span style="color: #800000;">"</span><span style="color: #800000;">node_exporter.go:97</span><span style="color: #800000;">"</span> <span style="color: #0000ff;">time</span>=<span style="color: #800000;">"</span><span style="color: #800000;">2019-12-05T05:50:42Z</span><span style="color: #800000;">"</span> level=<span style="color: #0000ff;">info</span> msg=<span style="color: #800000;">"</span><span style="color: #800000;"> - zfs</span><span style="color: #800000;">"</span> source=<span style="color: #800000;">"</span><span style="color: #800000;">node_exporter.go:97</span><span style="color: #800000;">"</span> <span style="color: #0000ff;">time</span>=<span style="color: #800000;">"</span><span style="color: #800000;">2019-12-05T05:50:42Z</span><span style="color: #800000;">"</span> level=<span style="color: #0000ff;">info</span> msg=<span style="color: #800000;">"</span><span style="color: #800000;">Listening on :9100</span><span style="color: #800000;">"</span> source=<span style="color: #800000;">"</span><span style="color: #800000;">node_exporter.go:111</span><span style="color: #800000;">"</span></pre> <div class="cnblogs_code_toolbar"><span class="cnblogs_code_copy"><a href="javascript:void(0);" onclick="copyCnblogsCode(this)" title="复制代码"><img src="//common.cnblogs.com/images/copycode.gif" alt="复制代码"></a></span></div></div> <pre class="ql-syntax">#接下来,我们在任意集群节点curl 9100/metrics</pre> <div class="cnblogs_code"><div class="cnblogs_code_toolbar"><span class="cnblogs_code_copy"><a href="javascript:void(0);" onclick="copyCnblogsCode(this)" title="复制代码"><img src="//common.cnblogs.com/images/copycode.gif" alt="复制代码"></a></span></div> <pre>[root@k8s-<span style="color: #800080;">01</span> prometheus]# curl <span style="color: #800080;">127.0</span>.<span style="color: #800080;">0.1</span>:<span style="color: #800080;">9100</span>/metrics|<span style="color: #0000ff;">head</span> % Total % Received %<span style="color: #000000;"> Xferd Average Speed Time Time Time Current Dload Upload Total Spent Left Speed </span><span style="color: #800080;">0</span> <span style="color: #800080;">0</span> <span style="color: #800080;">0</span> <span style="color: #800080;">0</span> <span style="color: #800080;">0</span> <span style="color: #800080;">0</span> <span style="color: #800080;">0</span> <span style="color: #800080;">0</span> --:--:-- --:--:-- --:--:-- <span style="color: #800080;">0</span><span style="color: #000000;"># HELP go_gc_duration_seconds A summary of the GC invocation durations. # TYPE go_gc_duration_seconds summary go_gc_duration_seconds{quantile</span>=<span style="color: #800000;">"</span><span style="color: #800000;">0</span><span style="color: #800000;">"</span>} <span style="color: #800080;">0.000239179</span><span style="color: #000000;"> go_gc_duration_seconds{quantile</span>=<span style="color: #800000;">"</span><span style="color: #800000;">0.25</span><span style="color: #800000;">"</span>} <span style="color: #800080;">0.000322674</span><span style="color: #000000;"> go_gc_duration_seconds{quantile</span>=<span style="color: #800000;">"</span><span style="color: #800000;">0.5</span><span style="color: #800000;">"</span>} <span style="color: #800080;">0.000361148</span><span style="color: #000000;"> go_gc_duration_seconds{quantile</span>=<span style="color: #800000;">"</span><span style="color: #800000;">0.75</span><span style="color: #800000;">"</span>} <span style="color: #800080;">0.000416324</span><span style="color: #000000;"> go_gc_duration_seconds{quantile</span>=<span style="color: #800000;">"</span><span style="color: #800000;">1</span><span style="color: #800000;">"</span>} <span style="color: #800080;">0.000513074</span><span style="color: #000000;"> go_gc_duration_seconds_sum </span><span style="color: #800080;">0.006654219</span><span style="color: #000000;"> go_gc_duration_seconds_count </span><span style="color: #800080;">18</span><span style="color: #000000;"> # HELP go_goroutines Number of goroutines that currently exist. </span><span style="color: #800080;">100</span> <span style="color: #800080;">64669</span> <span style="color: #800080;">100</span> <span style="color: #800080;">64669</span> <span style="color: #800080;">0</span> <span style="color: #800080;">0</span> 2235k <span style="color: #800080;">0</span> --:--:-- --:--:-- --:--:--<span style="color: #000000;"> 2339k curl: (</span><span style="color: #800080;">23</span>) Failed writing body (<span style="color: #800080;">135</span> != <span style="color: #800080;">15652</span>)</pre> <div class="cnblogs_code_toolbar"><span class="cnblogs_code_copy"><a href="javascript:void(0);" onclick="copyCnblogsCode(this)" title="复制代码"><img src="//common.cnblogs.com/images/copycode.gif" alt="复制代码"></a></span></div></div> <pre class="ql-syntax">只要metrics可以获取到数据说明node-exporter没有问题</pre> <h2>服务发现</h2> <p>我们这里三个节点都运行了<code>node-exporter</code>程序,如果我们通过一个Server来将数据收集在一起,用静态的方式配置到prometheus就会显示一条数据,我们得自己在指标中过滤每个节点的数据,配置比较麻烦。 这里就采用服务发现</p> <p>在Kubernetes下,Prometheus通过Kubernetes API基础,目前主要支持5种服务发现,分别是<code>node</code>、<code>Server</code>、<code>Pod</code>、<code>Endpoints</code>、<code>Ingress</code></p> <p>需要我们在Prometheus配置文件中,添加如下三行</p> <div class="cnblogs_code"> <pre> - job_name: <span style="color: #800000;">'</span><span style="color: #800000;">kubernetes-node</span><span style="color: #800000;">'</span><span style="color: #000000;"> kubernetes_sd_configs: </span>-<span style="color: #000000;"> role: node
#通过制定Kubernetes_sd_config的模式为node,prometheus就会自动从Kubernetes中发现所有的node节点并作为当前job监控的目标实例,发现的节点</span>/metrics接口是默认的kubelet的HTTP接口</pre>
</div> <p><img src="https://img2018.cnblogs.com/blog/981215/201912/981215-20191205143429576-1524671294.png" alt=""></p> <p> </p> <p> </p> <p> </p> <p>接下来我们更新配置文件</p> <div class="cnblogs_code"><div class="cnblogs_code_toolbar"><span class="cnblogs_code_copy"><a href="javascript:void(0);" onclick="copyCnblogsCode(this)" title="复制代码"><img src="//common.cnblogs.com/images/copycode.gif" alt="复制代码"></a></span></div> <pre>[root@k8s-<span style="color: #800080;">01</span> prometheus]# kubectl apply -<span style="color: #000000;">f prometheus.configmap.yaml configmap</span>/prometheus-<span style="color: #000000;">config configured [root@k8s</span>-<span style="color: #800080;">01</span><span style="color: #000000;"> prometheus]# [root@k8s</span>-<span style="color: #800080;">01</span><span style="color: #000000;"> prometheus]# [root@k8s</span>-<span style="color: #800080;">01</span> prometheus]# kubectl get svc -n kube-system |<span style="color: #0000ff;">grep</span><span style="color: #000000;"> prometheus prometheus NodePort </span><span style="color: #800080;">10.1</span>.<span style="color: #800080;">183.250</span> <none> <span style="color: #800080;">9090</span>:<span style="color: #800080;">30129</span>/<span style="color: #000000;">TCP 169m [root@k8s</span>-<span style="color: #800080;">01</span><span style="color: #000000;"> prometheus]# [root@k8s</span>-<span style="color: #800080;">01</span> prometheus]# curl -X POST http:<span style="color: #008000;">//</span><span style="color: #008000;">10.1.183.250:9090/-/reload</span> #热更新刷新配置(需要等待一小会)</pre> <div class="cnblogs_code_toolbar"><span class="cnblogs_code_copy"><a href="javascript:void(0);" onclick="copyCnblogsCode(this)" title="复制代码"><img src="//common.cnblogs.com/images/copycode.gif" alt="复制代码"></a></span></div></div> <p>接着访问我们的地址</p> <blockquote> <p><a href="http://10.4.82.138:32331/targets" target="_blank">http://192.168.122.217:30129/targets</a></p> </blockquote> <blockquote> <p>这个端口要和service对上</p> </blockquote> <p>现在我们可以看到已经获取到我们的Node节点的IP,但是由于metrics监听的端口是10250而并不是我们设置的9100,所以提示我们节点属于Down的状态</p> <p><img src="https://img2018.cnblogs.com/blog/981215/201912/981215-20191205143938768-1176555756.png" alt=""></p> <p> </p> <p> </p> <p> </p> <p>这里我们就需要使用Prometheus提供的<code>relabel_configs</code>中的<code>replace</code>能力了,relabel可以在Prometheus采集数据之前,通过Target实例的Metadata信息,动态重新写入Label的值。除此之外,我们还能根据Target实例的Metadata信息选择是否采集或者忽略该Target实例。这里使用<code>__address__</code>标签替换10250端口为9100</p> <p>这里使用正则进行替换端口</p> <div class="cnblogs_code"><div class="cnblogs_code_toolbar"><span class="cnblogs_code_copy"><a href="javascript:void(0);" onclick="copyCnblogsCode(this)" title="复制代码"><img src="//common.cnblogs.com/images/copycode.gif" alt="复制代码"></a></span></div> <pre> - job_name: <span style="color: #800000;">'</span><span style="color: #800000;">kubernetes-node</span><span style="color: #800000;">'</span><span style="color: #000000;"> kubernetes_sd_configs: </span>-<span style="color: #000000;"> role: node relabel_configs: </span>-<span style="color: #000000;"> source_labels: [__address__] regex: </span><span style="color: #800000;">'</span><span style="color: #800000;">(.*):10250</span><span style="color: #800000;">'</span><span style="color: #000000;"> replacement: </span><span style="color: #800000;">'</span><span style="color: #800000;">${1}:9100</span><span style="color: #800000;">'</span><span style="color: #000000;"> target_label: __address__ action: replace</span></pre> <div class="cnblogs_code_toolbar"><span class="cnblogs_code_copy"><a href="javascript:void(0);" onclick="copyCnblogsCode(this)" title="复制代码"><img src="//common.cnblogs.com/images/copycode.gif" alt="复制代码"></a></span></div></div> <p><img src="https://img2018.cnblogs.com/blog/981215/201912/981215-20191205144103111-1959814187.png" alt=""></p> <p> </p> <p> </p> <p> </p> <p>接下来我们更新一下配置</p> <p>curl的时候可以多更新几次,顺便等待一会</p> <div class="cnblogs_code"> <pre>[root@k8s-<span style="color: #800080;">01</span> prometheus]# kubectl apply -<span style="color: #000000;">f prometheus.configmap.yaml configmap</span>/prometheus-<span style="color: #000000;">config configured [root@k8s</span>-<span style="color: #800080;">01</span> prometheus]# curl -X POST http:<span style="color: #008000;">//</span><span style="color: #008000;">10.1.183.250:9090/-/reload</span> [root@k8s-<span style="color: #800080;">01</span> prometheus]# curl -X POST http:<span style="color: #008000;">//</span><span style="color: #008000;">10.1.183.250:9090/-/reload</span> [root@k8s-<span style="color: #800080;">01</span> prometheus]# </pre> </div> <p><img src="https://img2018.cnblogs.com/blog/981215/201912/981215-20191205144250316-775416620.png" alt=""></p> <p> </p> <p> </p> <p>现在在状态就正常了</p> <p> </p> <p>目前状态已经正常,但是还有一个问题就是我们的采集数据只显示了IP地址,对于我们监控分组分类不是很方便,这里可以通过<code>labelmap</code>这个属性来将Kubernetes的Label标签添加为Prometheus的指标标签</p> <div class="cnblogs_code"><div class="cnblogs_code_toolbar"><span class="cnblogs_code_copy"><a href="javascript:void(0);" onclick="copyCnblogsCode(this)" title="复制代码"><img src="//common.cnblogs.com/images/copycode.gif" alt="复制代码"></a></span></div> <pre> - job_name: <span style="color: #800000;">'</span><span style="color: #800000;">kubernetes-node</span><span style="color: #800000;">'</span><span style="color: #000000;"> kubernetes_sd_configs: </span>-<span style="color: #000000;"> role: node relabel_configs: </span>-<span style="color: #000000;"> source_labels: [__address__] regex: </span><span style="color: #800000;">'</span><span style="color: #800000;">(.*):10250</span><span style="color: #800000;">'</span><span style="color: #000000;"> replacement: </span><span style="color: #800000;">'</span><span style="color: #800000;">${1}:9100</span><span style="color: #800000;">'</span><span style="color: #000000;"> target_label: __address__ action: replace </span>-<span style="color: #000000;"> action: labelmap regex: __meta_kubernetes_node_label_(.</span>+)</pre> <div class="cnblogs_code_toolbar"><span class="cnblogs_code_copy"><a href="javascript:void(0);" onclick="copyCnblogsCode(this)" title="复制代码"><img src="//common.cnblogs.com/images/copycode.gif" alt="复制代码"></a></span></div></div> <p><img src="https://img2018.cnblogs.com/blog/981215/201912/981215-20191205144433666-227822668.png" alt=""></p> <p> </p> <p> </p> <p>添加了一个action为<code>labelmap</code>,正则表达式是<code>__meta_kubernetes_node(.+)</code>的配置,这里的意思就是表达式中匹配的数据也添加到指标数据的Label标签中去。</p> <div class="cnblogs_code"> <pre>[root@k8s-<span style="color: #800080;">01</span> prometheus]# kubectl apply -<span style="color: #000000;">f prometheus.configmap.yaml configmap</span>/prometheus-<span style="color: #000000;">config configured [root@k8s</span>-<span style="color: #800080;">01</span> prometheus]# curl -X POST http:<span style="color: #008000;">//</span><span style="color: #008000;">10.1.183.250:9090/-/reload</span> [root@k8s-<span style="color: #800080;">01</span> prometheus]# curl -X POST http:<span style="color: #008000;">//</span><span style="color: #008000;">10.1.183.250:9090/-/reload</span></pre> </div> <p><img src="https://img2018.cnblogs.com/blog/981215/201912/981215-20191205144736148-1105027452.png" alt=""></p> <p> </p> <p> </p> <p>实际上就是获取我们的标签</p> <div class="cnblogs_code"><div class="cnblogs_code_toolbar"><span class="cnblogs_code_copy"><a href="javascript:void(0);" onclick="copyCnblogsCode(this)" title="复制代码"><img src="//common.cnblogs.com/images/copycode.gif" alt="复制代码"></a></span></div> <pre>[root@k8s-<span style="color: #800080;">01</span> ~]# kubectl get nodes --show-<span style="color: #000000;">labels NAME STATUS ROLES AGE VERSION LABELS k8s</span>-<span style="color: #800080;">01</span> Ready master 30d v1.<span style="color: #800080;">16.0</span> beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,kubernetes.io/arch=amd64,kubernetes.io/<span style="color: #0000ff;">hostname</span>=k8s-<span style="color: #800080;">01</span>,kubernetes.io/os=linux,node-role.kubernetes.io/master=<span style="color: #000000;"> k8s</span>-<span style="color: #800080;">02</span> Ready <none> 30d v1.<span style="color: #800080;">16.0</span> beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,kubernetes.io/arch=amd64,kubernetes.io/<span style="color: #0000ff;">hostname</span>=k8s-<span style="color: #800080;">02</span>,kubernetes.io/os=<span style="color: #000000;">linux k8s</span>-<span style="color: #800080;">03</span> Ready <none> 30d v1.<span style="color: #800080;">16.0</span> beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,kubernetes.io/arch=amd64,kubernetes.io/<span style="color: #0000ff;">hostname</span>=k8s-<span style="color: #800080;">03</span>,kubernetes.io/os=<span style="color: #000000;">linux [root@k8s</span>-<span style="color: #800080;">01</span> ~]# </pre> <div class="cnblogs_code_toolbar"><span class="cnblogs_code_copy"><a href="javascript:void(0);" onclick="copyCnblogsCode(this)" title="复制代码"><img src="//common.cnblogs.com/images/copycode.gif" alt="复制代码"></a></span></div></div> <p>对于Kubernetes_sd_configs下面可用的元标签如下</p> <ul> <li>__meta_kubernetes_node_name: 节点对象的名称</li> <li>_meta_kubernetes_node_label: 节点对象中的每个标签</li> <li>_meta_kubernetes_node_annotation: 来自节点对象的每个注释</li> </ul> <blockquote> <p>_meta_kubernetes_node_address: 每个节点地址类型的第一个地址(如果存在) 关于kubernetes_sd_configs更多信息可以查看官方文档: <a href="https://prometheus.io/docs/prometheus/latest/configuration/configuration/#%3Ckubernetes_sd_config%3E" target="_blank">kubernetes_sd_config</a></p> </blockquote> <div class="cnblogs_code"><div class="cnblogs_code_toolbar"><span class="cnblogs_code_copy"><a href="javascript:void(0);" onclick="copyCnblogsCode(this)" title="复制代码"><img src="//common.cnblogs.com/images/copycode.gif" alt="复制代码"></a></span></div> <pre><span style="color: #000000;">#prometheus configmap 监控完整配置如下,可以直接拷贝 root@k8s</span>-<span style="color: #800080;">01</span> prometheus]# <span style="color: #0000ff;">cat</span><span style="color: #000000;"> prometheus.configmap.yaml apiVersion: v1 kind: ConfigMap metadata: name: prometheus</span>-<span style="color: #000000;">config namespace: kube</span>-<span style="color: #000000;">system data: prometheus.yml: </span>|<span style="color: #000000;"> global: scrape_interval: 15s scrape_timeout: 15s scrape_configs: </span>- job_name: <span style="color: #800000;">'</span><span style="color: #800000;">prometheus</span><span style="color: #800000;">'</span><span style="color: #000000;"> static_configs: </span>- targets: [<span style="color: #800000;">'</span><span style="color: #800000;">localhost:9090</span><span style="color: #800000;">'</span><span style="color: #000000;">]
</span>- job_name: <span style="color: #800000;">'</span><span style="color: #800000;">kubernetes-node</span><span style="color: #800000;">'</span><span style="color: #000000;">
kubernetes_sd_configs:
</span>-<span style="color: #000000;"> role: node
relabel_configs:
</span>-<span style="color: #000000;"> source_labels: [__address__]
regex: </span><span style="color: #800000;">'</span><span style="color: #800000;">(.*):10250</span><span style="color: #800000;">'</span><span style="color: #000000;">
replacement: </span><span style="color: #800000;">'</span><span style="color: #800000;">${1}:9100</span><span style="color: #800000;">'</span><span style="color: #000000;">
target_label: __address__
action: replace
</span>-<span style="color: #000000;"> action: labelmap
regex: __meta_kubernetes_node_label_(.</span>+<span style="color: #000000;">)
</span>- job_name: <span style="color: #800000;">'</span><span style="color: #800000;">kubernetes-cadvisor</span><span style="color: #800000;">'</span><span style="color: #000000;">
kubernetes_sd_configs:
</span>-<span style="color: #000000;"> role: node
scheme: https
tls_config:
ca_file: </span>/var/run/secrets/kubernetes.io/serviceaccount/<span style="color: #000000;">ca.crt
bearer_token_file: </span>/var/run/secrets/kubernetes.io/serviceaccount/<span style="color: #000000;">token
relabel_configs:
</span>-<span style="color: #000000;"> action: labelmap
regex: __meta_kubernetes_node_label_(.</span>+<span style="color: #000000;">)
</span>-<span style="color: #000000;"> target_label: __address__
replacement: kubernetes.default.svc:</span><span style="color: #800080;">443</span>
-<span style="color: #000000;"> source_labels: [__meta_kubernetes_node_name]
regex: (.</span>+<span style="color: #000000;">)
target_label: __metrics_path__
replacement: </span>/api/v1/nodes/${<span style="color: #800080;">1</span>}/proxy/metrics/cadvisor </pre>
<div class="cnblogs_code_toolbar"><span class="cnblogs_code_copy"><a href="javascript:void(0);" onclick="copyCnblogsCode(this)" title="复制代码"><img src="//common.cnblogs.com/images/copycode.gif" alt="复制代码"></a></span></div></div> <p>我们还可以去<code>Graph</code>里面看一下数据</p> <p><img src="https://img2018.cnblogs.com/blog/981215/201912/981215-20191205145348610-395814471.png" alt=""></p> <p> </p> <p> </p> <p>我们这里也可以自定义规则</p> <p><img src="https://img2018.cnblogs.com/blog/981215/201912/981215-20191205145650901-902815516.png" alt=""></p> <p> </p> <h2>容器监控</h2> <p>cAdvisor是一个容器资源监控工具,包括容器的内存,CPU,网络IO,资源IO等资源,同时提供了一个Web页面用于查看容器的实时运行状态。</p> <p>cAvisor已经内置在了kubelet组件之中,所以我们不需要单独去安装,cAdvisor的数据路径为<code>/api/v1/nodes//proxy/metrics</code></p> <p>action 使用labelkeep或者labeldrop则可以对Target标签进行过滤,仅保留符合过滤条件的标签</p> <div class="cnblogs_code"><div class="cnblogs_code_toolbar"><span class="cnblogs_code_copy"><a href="javascript:void(0);" onclick="copyCnblogsCode(this)" title="复制代码"><img src="//common.cnblogs.com/images/copycode.gif" alt="复制代码"></a></span></div> <pre> - job_name: <span style="color: #800000;">'</span><span style="color: #800000;">kubernetes-cadvisor</span><span style="color: #800000;">'</span><span style="color: #000000;"> kubernetes_sd_configs: </span>-<span style="color: #000000;"> role: node scheme: https tls_config: ca_file: </span>/var/run/secrets/kubernetes.io/serviceaccount/<span style="color: #000000;">ca.crt bearer_token_file: </span>/var/run/secrets/kubernetes.io/serviceaccount/<span style="color: #000000;">token relabel_configs: </span>-<span style="color: #000000;"> action: labelmap regex: __meta_kubernetes_node_label_(.</span>+<span style="color: #000000;">) </span>-<span style="color: #000000;"> target_label: __address__ replacement: kubernetes.default.svc:</span><span style="color: #800080;">443</span> -<span style="color: #000000;"> source_labels: [__meta_kubernetes_node_name] regex: (.</span>+<span style="color: #000000;">) target_label: __metrics_path__ replacement: </span>/api/v1/nodes/${<span style="color: #800080;">1</span>}/proxy/metrics/cadvisor</pre> <div class="cnblogs_code_toolbar"><span class="cnblogs_code_copy"><a href="javascript:void(0);" onclick="copyCnblogsCode(this)" title="复制代码"><img src="//common.cnblogs.com/images/copycode.gif" alt="复制代码"></a></span></div></div> <p><img src="https://img2018.cnblogs.com/blog/981215/201912/981215-20191205152935575-325082905.png" alt=""></p> <p> </p> <p> </p> <p>ls_config配置的证书地址是每个Pod连接apiserver所使用的地址,基本上写死了。并且我们在配置文件添加了一个labelmap标签。在最下面使用了一个正则替换了cAdvisor的一个metrics地址</p> <blockquote> <p>证书是我们Pod启动的时候kubelet给pod注入的一个证书,所有的pod启动的时候都会有一个ca证书注入进来</p> </blockquote> <blockquote> <p>如要想要访问apiserver的信息,还需要配置一个token_file</p> </blockquote> <p>修改完成之后,我们需要<code>configmap</code>并且使用curl进行热更新(过程比较慢,需要等待会)</p> <div class="cnblogs_code"> <pre>[root@k8s-<span style="color: #800080;">01</span> prometheus]# kubectl apply -<span style="color: #000000;">f prometheus.configmap.yaml configmap</span>/prometheus-<span style="color: #000000;">config configured [root@k8s</span>-<span style="color: #800080;">01</span> prometheus]# curl -X POST http:<span style="color: #008000;">//</span><span style="color: #008000;">10.1.183.250:9090/-/reload</span> [root@k8s-<span style="color: #800080;">01</span> prometheus]# curl -X POST http:<span style="color: #008000;">//</span><span style="color: #008000;">10.1.183.250:9090/-/reload</span></pre> </div> <p><img src="https://img2018.cnblogs.com/blog/981215/201912/981215-20191205153043302-155663901.png" alt=""></p> <p> </p> <p> </p> <p>现在我们可以到<code>Graph</code>路径下面查询容器的相关数据</p> <blockquote> <p>这里演示查询集群中所有Pod的CPU使用情况,查询指标<code>container_cpu_usage_seconds_total</code>并且查询1分钟之内的数据</p> </blockquote> <p>这里演示一下使用函数<code>rate</code>和不使用函数的一个过滤功能</p> <div class="cnblogs_code"> <pre>container_cpu_usage_seconds_total{image!=<span style="color: #800000;">" "</span>,pod_name!=<span style="color: #800000;">" "</span><span style="color: #000000;">} rate(container_cpu_usage_seconds_total{image</span>!=<span style="color: #800000;">" "</span>,pod_name!=<span style="color: #800000;">" "</span>}[1m])</pre> </div> <p>执行下方命令,过滤1分钟内的数据</p> <p><code>rate(container_cpu_usage_seconds_total{image!=" ",pod_name!=" "}[1m])</code></p> <p><img src="https://img2018.cnblogs.com/blog/981215/201912/981215-20191205161803913-1946744961.png" alt=""></p> <p> </p> <p> </p> <p>还可以使用<code>sum</code>函数,pod在1分钟内的使用率,同时将pod名称打印出来</p> <div class="cnblogs_code"> <pre><span style="color: #0000ff;">sum</span> by (pod)(rate(container_cpu_usage_seconds_total{image!=<span style="color: #800000;">"</span> <span style="color: #800000;">"</span>, pod_name!=<span style="color: #800000;">"</span> <span style="color: #800000;">"</span>}[1m] ))</pre> </div> <p><img src="https://img2018.cnblogs.com/blog/981215/201912/981215-20191205162408938-1312050103.png" alt=""></p> <p> </p> <p> </p> <h2>Api-Service 监控</h2> <p>apiserver作为Kubernetes最核心的组件,它的监控也是非常有必要的,对于apiserver的监控,我们可以直接通过kubernetes的service来获取</p> <div class="cnblogs_code"><div class="cnblogs_code_toolbar"><span class="cnblogs_code_copy"><a href="javascript:void(0);" onclick="copyCnblogsCode(this)" title="复制代码"><img src="//common.cnblogs.com/images/copycode.gif" alt="复制代码"></a></span></div> <pre>[root@k8s-<span style="color: #800080;">01</span> prometheus]# kubectl get svc --all-<span style="color: #000000;">namespaces NAMESPACE NAME TYPE CLUSTER</span>-IP EXTERNAL-<span style="color: #000000;">IP PORT(S) AGE</span><span style="color: #000000;"> default kubernetes ClusterIP </span><span style="color: #800080;">10.1</span>.<span style="color: #800080;">0.1</span> <none> <span style="color: #800080;">443</span>/<span style="color: #000000;">TCP 31d ingress</span>-nginx ingress-nginx ClusterIP <span style="color: #800080;">10.1</span>.<span style="color: #800080;">216.99</span> <none> <span style="color: #800080;">80</span>/TCP,<span style="color: #800080;">443</span>/<span style="color: #000000;">TCP 24h kube</span>-system kube-dns ClusterIP <span style="color: #800080;">10.1</span>.<span style="color: #800080;">0.10</span> <none> <span style="color: #800080;">53</span>/UDP,<span style="color: #800080;">53</span>/TCP,<span style="color: #800080;">9153</span>/<span style="color: #000000;">TCP 30d kube</span>-system kubelet ClusterIP None <none> <span style="color: #800080;">10250</span>/<span style="color: #000000;">TCP 25h kube</span>-system prometheus NodePort <span style="color: #800080;">10.1</span>.<span style="color: #800080;">183.250</span> <none> <span style="color: #800080;">9090</span>:<span style="color: #800080;">30129</span>/<span style="color: #000000;">TCP 4h38m kubernetes</span>-dashboard dashboard-metrics-scraper ClusterIP <span style="color: #800080;">10.1</span>.<span style="color: #800080;">100.76</span> <none> <span style="color: #800080;">8000</span>/<span style="color: #000000;">TCP 30d kubernetes</span>-dashboard kubernetes-dashboard NodePort <span style="color: #800080;">10.1</span>.<span style="color: #800080;">158.92</span> <none> <span style="color: #800080;">443</span>:<span style="color: #800080;">30001</span>/<span style="color: #000000;">TCP 30d </span><span style="color: #000000;"> <br></span></pre> <div class="cnblogs_code_toolbar"><span class="cnblogs_code_copy"><a href="javascript:void(0);" onclick="copyCnblogsCode(this)" title="复制代码"><img src="//common.cnblogs.com/images/copycode.gif" alt="复制代码"></a></span></div></div> <p>上面的service是我们集群的apiserver内部的service的地址,要自动发现service类型的服务,需要使用<code>role</code>为<code>Endpoints</code>的<code>kubernetes_sd_configs (自动发现)</code>,我们只需要在configmap里面在添加Endpoints类型的服务发现</p> <div class="cnblogs_code"> <pre> - job_name: <span style="color: #800000;">'</span><span style="color: #800000;">kubernetes-apiserver</span><span style="color: #800000;">'</span><span style="color: #000000;"> kubernetes_sd_configs: </span>- role: endpoints</pre> </div> <p><img src="https://img2018.cnblogs.com/blog/981215/201912/981215-20191205162835438-1844785288.png" alt=""></p> <p> </p> <p> </p> <p>刷新配置文件</p> <div class="cnblogs_code"><div class="cnblogs_code_toolbar"><span class="cnblogs_code_copy"><a href="javascript:void(0);" onclick="copyCnblogsCode(this)" title="复制代码"><img src="//common.cnblogs.com/images/copycode.gif" alt="复制代码"></a></span></div> <pre>[root@k8s-<span style="color: #800080;">01</span> prometheus]# kubectl get svc -n kube-<span style="color: #000000;">system NAME TYPE CLUSTER</span>-IP EXTERNAL-<span style="color: #000000;">IP PORT(S) AGE kube</span>-dns ClusterIP <span style="color: #800080;">10.1</span>.<span style="color: #800080;">0.10</span> <none> <span style="color: #800080;">53</span>/UDP,<span style="color: #800080;">53</span>/TCP,<span style="color: #800080;">9153</span>/<span style="color: #000000;">TCP 30d kubelet ClusterIP None </span><none> <span style="color: #800080;">10250</span>/<span style="color: #000000;">TCP 25h prometheus NodePort </span><span style="color: #800080;">10.1</span>.<span style="color: #800080;">183.250</span> <none> <span style="color: #800080;">9090</span>:<span style="color: #800080;">30129</span>/<span style="color: #000000;">TCP 4h43m [root@k8s</span>-<span style="color: #800080;">01</span> prometheus]# kubectl apply -<span style="color: #000000;">f prometheus.configmap.yaml configmap</span>/prometheus-<span style="color: #000000;">config configured [root@k8s</span>-<span style="color: #800080;">01</span> prometheus]# curl -X POST http:<span style="color: #008000;">//</span><span style="color: #008000;">10.1.183.250:9090/-/reload</span> [root@k8s-<span style="color: #800080;">01</span> prometheus]# curl -X POST http:<span style="color: #008000;">//</span><span style="color: #008000;">10.1.183.250:9090/-/reload</span> [root@k8s-<span style="color: #800080;">01</span> prometheus]# </pre> <div class="cnblogs_code_toolbar"><span class="cnblogs_code_copy"><a href="javascript:void(0);" onclick="copyCnblogsCode(this)" title="复制代码"><img src="//common.cnblogs.com/images/copycode.gif" alt="复制代码"></a></span></div></div> <p>更新完成后,我们可以看到kubernetes-apiserver下面出现了很多实例,这是因为我们这里使用的Endpoints类型的服务发现,所以prometheus把所有的Endpoints服务都抓取过来了,同样的我们要监控的kubernetes也在列表中。</p> <p><img src="https://img2018.cnblogs.com/blog/981215/201912/981215-20191205163127444-1621067532.png" alt=""></p> <p> </p> <p> </p> <p>这里我们使用<code>keep</code>动作,将符合配置的保留下来,例如我们过滤default命名空间下服务名称为<code>kubernetes</code>的元数据,这里可以根据<code>__meta_kubernetes_namespace</code>和<code>__mate_kubertnetes_service_name</code>2个元数据进行relabel</p> <div class="cnblogs_code"><div class="cnblogs_code_toolbar"><span class="cnblogs_code_copy"><a href="javascript:void(0);" onclick="copyCnblogsCode(this)" title="复制代码"><img src="//common.cnblogs.com/images/copycode.gif" alt="复制代码"></a></span></div> <pre> - job_name: kubernetes-<span style="color: #000000;">apiservers kubernetes_sd_configs: </span>-<span style="color: #000000;"> role: endpoints relabel_configs: </span>-<span style="color: #000000;"> action: keep regex: default;kubernetes;https source_labels: </span>-<span style="color: #000000;"> __meta_kubernetes_namespace </span>-<span style="color: #000000;"> __meta_kubernetes_service_name </span>-<span style="color: #000000;"> __meta_kubernetes_endpoint_port_name scheme: https tls_config: ca_file: </span>/var/run/secrets/kubernetes.io/serviceaccount/<span style="color: #000000;">ca.crt insecure_skip_verify: </span><span style="color: #0000ff;">true</span><span style="color: #000000;"> bearer_token_file: </span>/var/run/secrets/kubernetes.io/serviceaccount/token</pre> <div class="cnblogs_code_toolbar"><span class="cnblogs_code_copy"><a href="javascript:void(0);" onclick="copyCnblogsCode(this)" title="复制代码"><img src="//common.cnblogs.com/images/copycode.gif" alt="复制代码"></a></span></div></div> <div class="cnblogs_code"> <pre></pre> <pre><span style="color: #000000;">#参数解释 action: keep #保留哪些标签 regex: default;kubernetes;https #匹配namespace下的default命名空间下的kubernetes service 最后https协议 可以通过`kubectl describe svc kubernetes`查看到</span></pre> </div> <p>刷新配置</p> <pre class="ql-syntax">#这个过程比较慢,可能要等几分钟,可以多reload几次</pre> <div class="cnblogs_code"> <pre>[root@k8s-<span style="color: #800080;">01</span> prometheus]# kubectl apply -<span style="color: #000000;">f prometheus.configmap.yaml configmap</span>/prometheus-<span style="color: #000000;">config configured [root@k8s</span>-<span style="color: #800080;">01</span> prometheus]# curl -X POST http:<span style="color: #008000;">//</span><span style="color: #008000;">10.1.183.250:9090/-/reload</span> [root@k8s-<span style="color: #800080;">01</span> prometheus]# curl -X POST http:<span style="color: #008000;">//</span><span style="color: #008000;">10.1.183.250:9090/-/reload</span> [root@k8s-<span style="color: #800080;">01</span> prometheus]# curl -X POST http:<span style="color: #008000;">//</span><span style="color: #008000;">10.1.183.250:9090/-/reload</span> [root@k8s-<span style="color: #800080;">01</span> prometheus]# </pre> </div> <p><img src="https://img2018.cnblogs.com/blog/981215/201912/981215-20191205164646484-1022817461.png" alt=""></p> <p> </p> <p> </p> <p>接下来我们还是前往<code>Greph</code>上查看采集到的数据</p> <div class="cnblogs_code"> <pre><span style="color: #0000ff;">sum</span><span style="color: #000000;">(rate(apiserver_request_count[1m]))
#这里使用的promql里面的rate和sun函数,意思是apiserver在1分钟内请求的数</span></pre>
</div> <p><img src="https://img2018.cnblogs.com/blog/981215/201912/981215-20191205164858651-1223455278.png" alt=""></p> <p> </p> <p> </p> <p>如果我们要监控其他系统组件,比如kube-controller-manager、kube-scheduler的话就需要单独手动创建service,因为apiserver服务默认在default,而其他组件在kube-steam这个namespace下。其中kube-sheduler的指标数据端口为<code>10251</code>,kube-controller-manager对应端口为<code>10252</code></p> <p><code> </code></p> <h2>Service 监控</h2> <p>apiserver实际上是一种特殊的Service,现在配置一个专门发现普通类型的Service</p> <p>这里我们对service进行过滤,只有在service配置了<code>prometheus.io/scrape: "true"</code>过滤出来</p> <div class="cnblogs_code"><div class="cnblogs_code_toolbar"><span class="cnblogs_code_copy"><a href="javascript:void(0);" onclick="copyCnblogsCode(this)" title="复制代码"><img src="//common.cnblogs.com/images/copycode.gif" alt="复制代码"></a></span></div> <pre> - job_name: <span style="color: #800000;">'</span><span style="color: #800000;">kubernetes-service-endpoints</span><span style="color: #800000;">'</span><span style="color: #000000;"> kubernetes_sd_configs: </span>-<span style="color: #000000;"> role: endpoints relabel_configs: </span>-<span style="color: #000000;"> source_labels: [__meta_kubernetes_service_annotation_prometheus_io_scrape] action: keep regex: </span><span style="color: #0000ff;">true</span> -<span style="color: #000000;"> source_labels: [__meta_kubernetes_service_annotation_prometheus_io_scheme] action: replace target_label: __scheme__ regex: (https</span>?<span style="color: #000000;">) </span>-<span style="color: #000000;"> source_labels: [__meta_kubernetes_service_annotation_prometheus_io_path] action: replace target_label: __metrics_path__ regex: (.</span>+<span style="color: #000000;">) </span>-<span style="color: #000000;"> source_labels: [__address__, __meta_kubernetes_service_annotation_prometheus_io_port] action: replace target_label: __address__ regex: ([</span>^:]+)(?::\d+)?;(\d+<span style="color: #000000;">) replacement: $</span><span style="color: #800080;">1</span>:$<span style="color: #800080;">2</span> -<span style="color: #000000;"> action: labelmap regex: __meta_kubernetes_service_label_(.</span>+<span style="color: #000000;">) </span>-<span style="color: #000000;"> source_labels: [__meta_kubernetes_namespace] action: replace target_label: kubernetes_namespace </span>-<span style="color: #000000;"> source_labels: [__meta_kubernetes_service_name] action: replace target_label: kubernetes_name</span></pre> <div class="cnblogs_code_toolbar"><span class="cnblogs_code_copy"><a href="javascript:void(0);" onclick="copyCnblogsCode(this)" title="复制代码"><img src="//common.cnblogs.com/images/copycode.gif" alt="复制代码"></a></span></div></div> <p>继续重复步骤,刷新配置</p> <div class="cnblogs_code"> <pre>[root@k8s-<span style="color: #800080;">01</span><span style="color: #000000;"> prometheus]# [root@k8s</span>-<span style="color: #800080;">01</span> prometheus]# kubectl apply -<span style="color: #000000;">f prometheus.configmap.yaml configmap</span>/prometheus-<span style="color: #000000;">config configured [root@k8s</span>-<span style="color: #800080;">01</span> prometheus]# curl -X POST http:<span style="color: #008000;">//</span><span style="color: #008000;">10.1.183.250:9090/-/reload</span> [root@k8s-<span style="color: #800080;">01</span> prometheus]# curl -X POST http:<span style="color: #008000;">//</span><span style="color: #008000;">10.1.183.250:9090/-/reload</span> [root@k8s-<span style="color: #800080;">01</span> prometheus]# </pre> </div> <p>Serivce自动发现参数说明 (并不是所有创建的service都可以被prometheus发现)</p> <div class="cnblogs_code"><div class="cnblogs_code_toolbar"><span class="cnblogs_code_copy"><a href="javascript:void(0);" onclick="copyCnblogsCode(this)" title="复制代码"><img src="//common.cnblogs.com/images/copycode.gif" alt="复制代码"></a></span></div> <pre>#<span style="color: #800080;">1</span><span style="color: #000000;">.参数解释 relabel_configs: </span>-<span style="color: #000000;">source_labels:[__meta_kubernetes_service_annotation_prometheus_io_scrape] action: keep regex: </span><span style="color: #0000ff;">true</span><span style="color: #000000;"> 保留标签 source_labels: [__meta_kubernetes_service_annotation_prometheus_io_cheme]
这行配置代表我们只去筛选有__meta_kubernetes_service_annotation_prometheus_io_scrape的service,只有添加了这个声明才可以自动发现其他service
#</span><span style="color: #800080;">2</span><span style="color: #000000;">.参数解释 </span>-<span style="color: #000000;"> source_labels: [address, __meta_kubernetes_service_annotation_prometheus_io_port] action: replace target_label: address regex: ([</span>^:]+)(?::\d+)?;(\d+<span style="color: #000000;">) replacement: $</span><span style="color: #800080;">1</span>:$<span style="color: #800080;">2</span><span style="color: #000000;"> #指定一个抓取的端口,有的service可能有多个端口(比如之前的redis)。默认使用的是我们添加是使用kubernetes_service端口
#</span><span style="color: #800080;">3</span><span style="color: #000000;">.参数解释 </span>-<span style="color: #000000;"> source_labels: [__meta_kubernetes_service_annotation_prometheus_io_scheme] action: replace target_label: scheme regex: (https</span>?<span style="color: #000000;">) #这里如果是https证书类型,我们还需要在添加证书和token</span></pre>
<div class="cnblogs_code_toolbar"><span class="cnblogs_code_copy"><a href="javascript:void(0);" onclick="copyCnblogsCode(this)" title="复制代码"><img src="//common.cnblogs.com/images/copycode.gif" alt="复制代码"></a></span></div></div> <p>我们可以看到这里的服务的<code>core DNS</code>,为什么那么多service只有coreDNS可以被收集到呢?</p> <p> </p> <p><img src="https://img2018.cnblogs.com/blog/981215/201912/981215-20191205165511757-773918813.png" alt=""></p> <p> </p> <p> </p> <p>上面也说了,我们有过滤条件,只有复合条件的才进行过滤</p> <blockquote> <p>core DNS serviceYaml 文件包含<code>true</code>参数,所以会被匹配到</p> </blockquote> <p><img src="https://img2018.cnblogs.com/blog/981215/201912/981215-20191205165943118-432606321.png" alt=""></p> <p> </p> <p> </p> <p>继续重复步骤,刷新配置</p> <div class="cnblogs_code"><div class="cnblogs_code_toolbar"><span class="cnblogs_code_copy"><a href="javascript:void(0);" onclick="copyCnblogsCode(this)" title="复制代码"><img src="//common.cnblogs.com/images/copycode.gif" alt="复制代码"></a></span></div> <pre>[root@k8s-<span style="color: #800080;">01</span><span style="color: #000000;"> prometheus]# [root@k8s</span>-<span style="color: #800080;">01</span> prometheus]# kubectl apply -<span style="color: #000000;">f prometheus.configmap.yaml configmap</span>/prometheus-<span style="color: #000000;">config configured [root@k8s</span>-<span style="color: #800080;">01</span> prometheus]# curl -X POST http:<span style="color: #008000;">//</span><span style="color: #008000;">10.1.183.250:9090/-/reload</span> [root@k8s-<span style="color: #800080;">01</span> prometheus]# curl -X POST http:<span style="color: #008000;">//</span><span style="color: #008000;">10.1.183.250:9090/-/reload</span> [root@k8s-<span style="color: #800080;">01</span> prometheus]# curl -X POST http:<span style="color: #008000;">//</span><span style="color: #008000;">10.1.183.250:9090/-/reload</span> [root@k8s-<span style="color: #800080;">01</span> prometheus]# </pre> <div class="cnblogs_code_toolbar"><span class="cnblogs_code_copy"><a href="javascript:void(0);" onclick="copyCnblogsCode(this)" title="复制代码"><img src="//common.cnblogs.com/images/copycode.gif" alt="复制代码"></a></span></div></div> <p>当我们再次查看,发现状态完成</p> <p><img src="https://img2018.cnblogs.com/blog/981215/201912/981215-20191205170521230-2125899314.png" alt=""></p> <p> </p> <p> </p> <h2> </h2> <h1 class="entry-title"> </h1> <div id="entry-content" class="entry-content pos-r pd20"> <div id="content-innerText"> <div class="post-excerpt mar20-b pos-r mar20-t">Grafana是一个跨平台的开源的度量分析和可视化工具,可以通过将采集的数据查询然后可视化的展示,并及时通知。</div> <h1>Grafana 安装并监控k8s集群</h1> <p>由于Prometheus自带的web Ui图标功能相对较弱,所以一般情况下我们会使用一个第三方的工具来展示这些数据</p> <p>Grafana介绍</p> <p>grafana 是一个可视化面包,有着非常漂亮的图片和布局展示,功能齐全的度量仪表盘和图形化编辑器,支持Graphite、Zabbix、InfluxDB、Prometheus、OpenTSDB、Elasticasearch等作为数据源,比Prometheus自带的图标展示功能强大很多,更加灵活,有丰富的插件</p> <p>我们这里使用deployment持久化安装grafana</p> <div class="cnblogs_code"><div class="cnblogs_code_toolbar"><span class="cnblogs_code_copy"><a href="javascript:void(0);" onclick="copyCnblogsCode(this)" title="复制代码"><img src="//common.cnblogs.com/images/copycode.gif" alt="复制代码"></a></span></div> <pre><span style="color: #0000ff;">cat</span> >>grafana_deployment.yaml <<<span style="color: #000000;">EOF apiVersion: apps</span>/<span style="color: #000000;">v1 kind: Deployment metadata: name: grafana namespace: kube</span>-<span style="color: #000000;">system labels: app: grafana k8s</span>-<span style="color: #000000;">app: grafana spec: selector: matchLabels: k8s</span>-<span style="color: #000000;">app: grafana app: grafana revisionHistoryLimit: </span><span style="color: #800080;">10</span><span style="color: #000000;"> template: metadata: labels: app: grafana k8s</span>-<span style="color: #000000;">app: grafana spec: containers: </span>-<span style="color: #000000;"> name: grafana image: grafana</span>/grafana:<span style="color: #800080;">5.3</span>.<span style="color: #800080;">4</span><span style="color: #000000;"> imagePullPolicy: IfNotPresent ports: </span>- containerPort: <span style="color: #800080;">3000</span><span style="color: #000000;"> name: grafana </span><span style="color: #0000ff;">env</span><span style="color: #000000;">: </span>-<span style="color: #000000;"> name: GF_SECURITY_ADMIN_USER value: admin </span>-<span style="color: #000000;"> name: GF_SECURITY_ADMIN_PASSWORD value: jiangwenhui readinessProbe: failureThreshold: </span><span style="color: #800080;">10</span><span style="color: #000000;"> httpGet: path: </span>/api/<span style="color: #000000;">health port: </span><span style="color: #800080;">3000</span><span style="color: #000000;"> scheme: HTTP initialDelaySeconds: </span><span style="color: #800080;">60</span><span style="color: #000000;"> periodSeconds: </span><span style="color: #800080;">10</span><span style="color: #000000;"> successThreshold: </span><span style="color: #800080;">1</span><span style="color: #000000;"> timeoutSeconds: </span><span style="color: #800080;">30</span><span style="color: #000000;"> livenessProbe: failureThreshold: </span><span style="color: #800080;">3</span><span style="color: #000000;"> httpGet: path: </span>/api/<span style="color: #000000;">health port: </span><span style="color: #800080;">3000</span><span style="color: #000000;"> scheme: HTTP periodSeconds: </span><span style="color: #800080;">10</span><span style="color: #000000;"> successThreshold: </span><span style="color: #800080;">1</span><span style="color: #000000;"> timeoutSeconds: </span><span style="color: #800080;">1</span><span style="color: #000000;"> resources: limits: cpu: 300m memory: 1024Mi requests: cpu: 300m memory: 1024Mi volumeMounts: </span>- mountPath: /var/lib/<span style="color: #000000;">grafana subPath: grafana name: storage securityContext: fsGroup: </span><span style="color: #800080;">472</span><span style="color: #000000;"> runAsUser: </span><span style="color: #800080;">472</span><span style="color: #000000;"> volumes: </span>-<span style="color: #000000;"> name: storage persistentVolumeClaim: claimName: grafana EOF</span></pre> <div class="cnblogs_code_toolbar"><span class="cnblogs_code_copy"><a href="javascript:void(0);" onclick="copyCnblogsCode(this)" title="复制代码"><img src="//common.cnblogs.com/images/copycode.gif" alt="复制代码"></a></span></div></div> <p> </p> <p>这里使用了<code>grafana 5.3.4</code>的镜像,添加了监控检查、资源声明,比较重要的变量是<code>GF_SECURITY_ADMIN_USER</code>和<code>GF_SECURITY_ADMIN_PASSWORD</code>为grafana的账号和密码。</p> <p>由于grafana将dashboard、插件这些数据保留在<code>/var/lib/grafana</code>目录下,所以我们这里需要做持久化,同时要针对这个目录做挂载声明,由于5.3.4版本用户的userid和groupid都有所变化,所以这里添加了一个<code>securityContext</code>设置用户ID</p> <p><img src="http://static.zybuluo.com/abcdocker/e5pftmwx8nw435d0dn2yfn8c/image_1ddnv749l17k7ucdel1m4v17jjea.png" alt="image_1ddnv749l17k7ucdel1m4v17jjea.png-56.5kB"></p> <p>现在我们添加一个pv和pvc用于绑定grafana</p> <div class="cnblogs_code"><div class="cnblogs_code_toolbar"><span class="cnblogs_code_copy"><a href="javascript:void(0);" onclick="copyCnblogsCode(this)" title="复制代码"><img src="//common.cnblogs.com/images/copycode.gif" alt="复制代码"></a></span></div> <pre><span style="color: #0000ff;">cat</span> >>grafana_volume.yaml <<<span style="color: #000000;">EOF apiVersion: v1 kind: PersistentVolume metadata: name: grafana spec: capacity: storage: 10Gi accessModes: </span>-<span style="color: #000000;"> ReadWriteOnce persistentVolumeReclaimPolicy: Recycle nfs: server: </span><span style="color: #800080;">192.168</span>.<span style="color: #800080;">0.200</span><span style="color: #000000;"> path: </span>/home/kvm/k8s-<span style="color: #000000;">vloume </span>---<span style="color: #000000;"> apiVersion: v1 kind: PersistentVolumeClaim metadata: name: grafana namespace: kube</span>-<span style="color: #000000;">system spec: accessModes: </span>-<span style="color: #000000;"> ReadWriteOnce resources: requests: storage: 10Gi
EOF</span></pre>
<div class="cnblogs_code_toolbar"><span class="cnblogs_code_copy"><a href="javascript:void(0);" onclick="copyCnblogsCode(this)" title="复制代码"><img src="//common.cnblogs.com/images/copycode.gif" alt="复制代码"></a></span></div></div> <p>这里配置依旧使用NFS进行挂载使用</p> <p>现在我们还需要创建一个service,使用NodePort</p> <div class="cnblogs_code"><div class="cnblogs_code_toolbar"><span class="cnblogs_code_copy"><a href="javascript:void(0);" onclick="copyCnblogsCode(this)" title="复制代码"><img src="//common.cnblogs.com/images/copycode.gif" alt="复制代码"></a></span></div> <pre><span style="color: #0000ff;">cat</span> >>grafana_svc.yaml<<<span style="color: #000000;">EOF apiVersion: v1 kind: Service metadata: name: grafana namespace: kube</span>-<span style="color: #000000;">system labels: app: grafana spec: type: NodePort ports: </span>- port: <span style="color: #800080;">3000</span><span style="color: #000000;"> selector: app: grafana EOF</span></pre> <div class="cnblogs_code_toolbar"><span class="cnblogs_code_copy"><a href="javascript:void(0);" onclick="copyCnblogsCode(this)" title="复制代码"><img src="//common.cnblogs.com/images/copycode.gif" alt="复制代码"></a></span></div></div> <p>由于5.1(可以选择5.1之前的docker镜像,可以避免此类错误)版本后groupid更改,同时我们将<code>/var/lib/grafana</code>挂载到pvc后,目录拥有者可能不是grafana用户,所以我们还需要添加一个Job用于授权目录</p> <div class="cnblogs_code"><div class="cnblogs_code_toolbar"><span class="cnblogs_code_copy"><a href="javascript:void(0);" onclick="copyCnblogsCode(this)" title="复制代码"><img src="//common.cnblogs.com/images/copycode.gif" alt="复制代码"></a></span></div> <pre><span style="color: #0000ff;">cat</span> > grafana_job.yaml <<<span style="color: #000000;">EOF apiVersion: batch</span>/<span style="color: #000000;">v1 kind: Job metadata: name: grafana</span>-<span style="color: #0000ff;">chown</span><span style="color: #000000;"> namespace: kube</span>-<span style="color: #000000;">system spec: template: spec: restartPolicy: Never containers: </span>- name: grafana-<span style="color: #0000ff;">chown</span><span style="color: #000000;"> command: [</span><span style="color: #800000;">"</span><span style="color: #800000;">chown</span><span style="color: #800000;">"</span>, <span style="color: #800000;">"</span><span style="color: #800000;">-R</span><span style="color: #800000;">"</span>, <span style="color: #800000;">"</span><span style="color: #800000;">472:472</span><span style="color: #800000;">"</span>, <span style="color: #800000;">"</span><span style="color: #800000;">/var/lib/grafana</span><span style="color: #800000;">"</span><span style="color: #000000;">] image: busybox imagePullPolicy: IfNotPresent volumeMounts: </span>-<span style="color: #000000;"> name: storage subPath: grafana mountPath: </span>/var/lib/<span style="color: #000000;">grafana volumes: </span>-<span style="color: #000000;"> name: storage persistentVolumeClaim: claimName: grafana EOF</span></pre> <div class="cnblogs_code_toolbar"><span class="cnblogs_code_copy"><a href="javascript:void(0);" onclick="copyCnblogsCode(this)" title="复制代码"><img src="//common.cnblogs.com/images/copycode.gif" alt="复制代码"></a></span></div></div> <p>这里使用一个busybox镜像将<code>/var/lib/grafana</code>目录修改为权限<code>472</code></p> <div class="cnblogs_code"><div class="cnblogs_code_toolbar"><span class="cnblogs_code_copy"><a href="javascript:void(0);" onclick="copyCnblogsCode(this)" title="复制代码"><img src="//common.cnblogs.com/images/copycode.gif" alt="复制代码"></a></span></div> <pre><span style="color: #000000;">#需要先创建pv和pvc (这里是需要安装顺序来创建) [root@k8s</span>-<span style="color: #800080;">01</span> prometheus]# kubectl create -<span style="color: #000000;">f grafana_volume.yaml persistentvolume</span>/<span style="color: #000000;">grafana created persistentvolumeclaim</span>/<span style="color: #000000;">grafana created [root@k8s</span>-<span style="color: #800080;">01</span> prometheus]# kubectl create -<span style="color: #000000;">f grafana_job.yaml job.batch</span>/grafana-<span style="color: #0000ff;">chown</span><span style="color: #000000;"> created</span><span style="color: #000000;"> [root@k8s</span>-<span style="color: #800080;">01</span> prometheus]# kubectl apply -<span style="color: #000000;">f grafana_deployment.yaml deployment.apps</span>/<span style="color: #000000;">grafana created</span><span style="color: #000000;"> [root@k8s</span>-<span style="color: #800080;">01</span> prometheus]# kubectl create -f grafana_svc.yaml</pre> <div class="cnblogs_code_toolbar"><span class="cnblogs_code_copy"><a href="javascript:void(0);" onclick="copyCnblogsCode(this)" title="复制代码"><img src="//common.cnblogs.com/images/copycode.gif" alt="复制代码"></a></span></div></div> <p>创建完成后我们打开grafana的dashboard界面</p> <div class="cnblogs_code"> <pre>[root@k8s-<span style="color: #800080;">01</span><span style="color: #000000;"> prometheus]# [root@k8s</span>-<span style="color: #800080;">01</span> prometheus]# kubectl get pod,svc -n kube-system |<span style="color: #0000ff;">grep</span><span style="color: #000000;"> grafana pod</span>/grafana-59bd6c446d-4jjnf <span style="color: #800080;">1</span>/<span style="color: #800080;">1</span> Running <span style="color: #800080;">0</span><span style="color: #000000;"> 7m39s pod</span>/grafana-<span style="color: #0000ff;">chown</span>-w562v <span style="color: #800080;">0</span>/<span style="color: #800080;">1</span> Completed <span style="color: #800080;">0</span><span style="color: #000000;"> 14m service</span>/grafana NodePort <span style="color: #800080;">10.1</span>.<span style="color: #800080;">63.182</span> <none> <span style="color: #800080;">3000</span>:<span style="color: #800080;">30636</span>/<span style="color: #000000;">TCP 13m [root@k8s</span>-<span style="color: #800080;">01</span> prometheus]# </pre> </div> <p>然后我们在任意集群中的节点访问端口为<code>30636</code></p> <blockquote> <p>这里的集群密码就是上面我们创建deployment里面设置的变量,我这里用户设置为admin密码jiangwenhui</p> </blockquote> <p><img src="https://img2018.cnblogs.com/blog/981215/201912/981215-20191205174032842-1579771422.png" alt=""></p> <p> </p> <p> 登陆到grafana就显示到了我们的引导界面</p> <p><img src="https://img2018.cnblogs.com/blog/981215/201912/981215-20191205175407660-1036235675.png" alt=""></p> <p> </p> <p> </p> <p>登陆到grafana就显示到了我们的引导界面</p> <p>第一次创建grafana需要添加数据源</p> <p><img src="https://img2018.cnblogs.com/blog/981215/201912/981215-20191205175433278-1359864607.png" alt=""></p> <p> </p> <p> 类型选择prometheus</p> <p>这里的地址我们填写下面的url</p> <blockquote> <p><a href="http://prometheus.kube-system.svc.cluster.local:9090/" target="_blank">http://prometheus.kube-system.svc.cluster.local:9090</a></p> </blockquote> <blockquote> <p>这里的prometheus代表service名称</p> </blockquote> <blockquote> <p>kube-system代表命名空间</p> </blockquote> <p><img src="https://img2018.cnblogs.com/blog/981215/201912/981215-20191205175643960-1795014999.png" alt=""></p> <p> </p> <p> <img src="https://img2018.cnblogs.com/blog/981215/201912/981215-20191205180032331-1653416405.png" alt=""></p> <p> </p> <p> <img src="https://img2018.cnblogs.com/blog/981215/201912/981215-20191205180147168-1573544078.png" alt=""></p> <p> 数据源添加完毕后,接下来添加<code>New dashboard</code></p> <p><img src="https://img2018.cnblogs.com/blog/981215/201912/981215-20191206132456832-1957369909.png" alt=""></p> <p> </p> <p> 这里我们可以自定义模板,或者可以使用别人写好的模板 (写好的模板后面是需要我们自己修改的)</p> <p><img src="https://img2018.cnblogs.com/blog/981215/201912/981215-20191206133106532-798765113.png" alt=""></p> <p> </p> <p> grafana提供了很多模板,类似和docker镜像仓库一下。导入模板也极其简单。点击上方的<code>Dashboard</code></p> <p><img src="https://img2018.cnblogs.com/blog/981215/201912/981215-20191206133315288-2084711117.png" alt=""></p> <p> </p> <p> </p> <p><img src="https://img2018.cnblogs.com/blog/981215/201912/981215-20191206133332002-128565488.png" alt=""></p> <p>这里面的模板都是公共的,可以免费使用</p> <p><img src="https://img2018.cnblogs.com/blog/981215/201912/981215-20191206133627793-95983797.png" alt=""></p> <p> </p> <p> </p> <p>点进去任意一个模板后,我们可以看到ID,复制ID然后在返回<code>grafana</code></p> <blockquote> <p>我这里添加一个监控Kubernetes集群。显示整体群集CPU、内存、磁盘使用情况以及单个pod统计信息。</p> </blockquote> <blockquote> <p><a href="https://grafana.com/grafana/dashboards/8588">https://grafana.com/grafana/dashboards/8588</a></p> </blockquote> <p> <img src="https://img2018.cnblogs.com/blog/981215/201912/981215-20191206133912401-1251480789.png" alt=""></p> <p> </p> <p> </p> <p>点击导入模板</p> <p><img src="https://img2018.cnblogs.com/blog/981215/201912/981215-20191206133953567-1947209760.png" alt=""></p> <p> </p> <p> 在这里我们输入8588或者url,会自动跳转到配置页面</p> <p><a href="https://grafana.com/grafana/dashboards/8588">https://grafana.com/grafana/dashboards/8588</a></p> <p><img src="https://img2018.cnblogs.com/blog/981215/201912/981215-20191206134228778-1844790022.png" alt=""></p> <p> </p> <p> </p> <p>选择好数据源之后,我们在点击<code>Import</code>即可</p> <p><img src="https://img2018.cnblogs.com/blog/981215/201912/981215-20191206134302303-1708274511.png" alt=""></p> <p> </p> <p> </p> <p>这里就会将模板<code>8588</code>给我们导入进行</p> <p>这里就会获取我们prometheus里面的数据了</p> <p><img src="https://img2018.cnblogs.com/blog/981215/201912/981215-20191206134342368-583966067.png" alt=""></p> <p> </p> <p> </p> <p>现在的模板还没有进行保存,我们要点击保存一下</p> <p><img src="https://img2018.cnblogs.com/blog/981215/201912/981215-20191206134411889-133357289.png" alt=""></p> <p> </p> <p> </p> <p>现在就保存下来了</p> <p><img src="https://img2018.cnblogs.com/blog/981215/201912/981215-20191206134500835-1196826960.png" alt=""></p> <p> </p> <p> 目前我们导入模板之后是无法直接使用滴</p> <p><img src="https://img2018.cnblogs.com/blog/981215/201912/981215-20191206134549048-87889846.png" alt=""></p> <p> </p> <p> </p> <p>这里无法显示是由于模板定义的标签,我们prometheus并没有这个数据元,所以说我们要对模板进行修改!</p> <blockquote> <p>在修改之前我们先设置一下时区,grafana默认走的是浏览器时区,但是prometheus使用的是<code>UTC</code>时区</p> </blockquote> <p><img src="https://img2018.cnblogs.com/blog/981215/201912/981215-20191206134649482-46060159.png" alt=""></p> <p> </p> <p> </p> <blockquote> <p>修改默认模板 (我这里使用的是8588模板,下面模板修改请根据我的操作步骤进行操作)</p> </blockquote> <p>grafana模板修改</p> <blockquote> <p>前面的步骤必须和我相同,否则这里可能会无法出现值</p> </blockquote> <p>首先我们进行编辑 <code>Cluster memory usage (集群内存使用率)</code></p> <p><img src="https://img2018.cnblogs.com/blog/981215/201912/981215-20191206134939674-1438960242.png" alt=""></p> <p> </p> <p> </p> <p>计算方式就是(整个集群的内存-(整个集群剩余的内存以及Buffer和Cached))/整</p> <div class="cnblogs_code"> <pre>(<span style="color: #0000ff;">sum</span>(node_memory_MemTotal_bytes) - <span style="color: #0000ff;">sum</span>(node_memory_MemFree_bytes + node_memory_Buffers_bytes+node_memory_Cached_bytes)) / <span style="color: #0000ff;">sum</span>(node_memory_MemTotal_bytes) * <span style="color: #800080;">100</span></pre> </div> <p><img src="https://img2018.cnblogs.com/blog/981215/201912/981215-20191206135135216-887447068.png" alt=""></p> <p> </p> <p> </p> <p>这里要说明一点,这里填写的是PromSQL,也就是说是可以在prometheus查询到的。 如果查询不到grafana也是会获取不到数据的</p> <p>这里在prometheus是可以获取到的</p> <p><img src="https://img2018.cnblogs.com/blog/981215/201912/981215-20191206135244211-904770227.png" alt=""></p> <p> </p> <p> </p> <p>Cluster memory usage 配置如下 (集群内存使用率)</p> <div class="cnblogs_code"> <pre><span style="color: #0000ff;">sum</span>(<span style="color: #0000ff;">sum</span> by (container_name)( rate(container_cpu_usage_seconds_total{image!=<span style="color: #800000;">""</span>}[1m] ) )) / count(node_cpu_seconds_total{mode=<span style="color: #800000;">"</span><span style="color: #800000;">system</span><span style="color: #800000;">"</span>}) * <span style="color: #800080;">100</span></pre> </div> <p><img src="https://img2018.cnblogs.com/blog/981215/201912/981215-20191206135342303-126018166.png" alt=""></p> <p> </p> <p> </p> <p>Cluster filesystem usage 集群文件系统使用率</p> <div class="cnblogs_code"> <pre>(<span style="color: #0000ff;">sum</span>(node_filesystem_size_bytes{device=<span style="color: #800000;">"</span><span style="color: #800000;">tmpfs</span><span style="color: #800000;">"</span>}) - <span style="color: #0000ff;">sum</span>(node_filesystem_free_bytes{device=<span style="color: #800000;">"</span><span style="color: #800000;">tmpfs</span><span style="color: #800000;">"</span>}) ) / <span style="color: #0000ff;">sum</span>(node_filesystem_size_bytes{device=<span style="color: #800000;">"</span><span style="color: #800000;">tmpfs</span><span style="color: #800000;">"</span>}) * <span style="color: #800080;">100</span></pre> </div> <p><img src="https://img2018.cnblogs.com/blog/981215/201912/981215-20191206140124167-819265076.png" alt=""></p> <p> </p> <p> </p> <p>这里我们就获取到数据了</p> <p><img src="https://img2018.cnblogs.com/blog/981215/201912/981215-20191206141033947-1681764529.png" alt=""></p> <p> </p> <p> </p> <p> 接下来我们配置集群中Pod cpu使用率</p> <div class="cnblogs_code"> <pre><span style="color: #0000ff;">sum</span> by (pod)(rate(container_cpu_usage_seconds_total{image!=<span style="color: #800000;">" "</span>, pod_name!=<span style="color: #800000;">" "</span>}[1m]))</pre> </div> <p>下面显示的地方配置</p> <div class="cnblogs_code"> <pre>{{ pod }}</pre> </div> <p> <img src="https://img2018.cnblogs.com/blog/981215/201912/981215-20191206165530452-1368409643.png" alt=""></p> <p> </p> <p> 集群pod 内存使用率</p> <div class="cnblogs_code"> <pre>sort_desc(<span style="color: #0000ff;">sum</span> (container_memory_usage_bytes{image!=<span style="color: #800000;">"</span> <span style="color: #800000;">"</span>, pod_name!=<span style="color: #800000;">"</span> <span style="color: #800000;">"</span>}) by(pod))</pre> </div> <p>下面显示的名称同样也是<code>{{ pod }}</code></p> <p><img src="https://img2018.cnblogs.com/blog/981215/201912/981215-20191206165851965-2001995984.png" alt=""></p> <p> </p> <p> </p> <p>最后我们配置一下Pod 网络监控</p> <div class="cnblogs_code"><div class="cnblogs_code_toolbar"><span class="cnblogs_code_copy"><a href="javascript:void(0);" onclick="copyCnblogsCode(this)" title="复制代码"><img src="//common.cnblogs.com/images/copycode.gif" alt="复制代码"></a></span></div> <pre><span style="color: #800080;">1</span><span style="color: #000000;">.入口流量 sort_desc(</span><span style="color: #0000ff;">sum</span> by (pod) (rate (container_network_receive_bytes_total{name!=<span style="color: #800000;">""</span><span style="color: #000000;">}[1m]) ))
</span><span style="color: #800080;">2</span><span style="color: #000000;">.出口流量 sort_desc(</span><span style="color: #0000ff;">sum</span> by (pod) (rate (container_network_transmit_bytes_total{name!=<span style="color: #800000;">""</span><span style="color: #000000;">}[1m]) ))
#监控时间为1分钟</span></pre>
<div class="cnblogs_code_toolbar"><span class="cnblogs_code_copy"><a href="javascript:void(0);" onclick="copyCnblogsCode(this)" title="复制代码"><img src="//common.cnblogs.com/images/copycode.gif" alt="复制代码"></a></span></div></div> <p> </p> <p><img src="https://img2018.cnblogs.com/blog/981215/201912/981215-20191206171540984-490624665.png" alt=""></p>
<p>效果图如下 <code>记得点击保存</code></p> <p><img src="https://img2018.cnblogs.com/blog/981215/201912/981215-20191206171002576-1669171595.png" alt=""></p> <p> <img src="https://img2018.cnblogs.com/blog/981215/201912/981215-20191206171743545-1968909768.png" alt=""></p>
<p>所有的PromSQL都是可以在prometheus获取到数据的!</p>
</div> </div>
</div>
来源:oschina
链接:https://my.oschina.net/u/4279343/blog/3331414