Prometheus

Prometheus和Grafana告警服务创建与对接腾讯云短信告警平台(prometheus_alert)

天涯浪子 提交于 2021-02-20 13:26:39
前言 在一个监控系统中,如果说数据链路是她的骨架,那么告警通知服务就是他的灵魂!所有的监控服务都是为了能够及时通知出来,减少人工查询状态,及时发现问题,避免不必要的大规模故障,为企业政府省钱,和保证安全而存在的。 所以能发现问题很重要,更重要的是发现问题赶快让人知道,这就是今天要说的,告警通知服务。 一个开源项目PrometheusAlert 这个项目可以给 很多第三方服务对接 ,进行电话 、短信 等告警方式 ,也是我们要用到的 ,先部署起来。 github位置 部署方式参考项目中 README.md 的 部署方式 那一节,要注意的是 ,他的配置文件必须在二进制文件的当前目录, conf/app.conf 叫这个名字才会读取。 原因是用到 beego 框架 ,默认读取这个位置的配置文件,如果没有符合的二进制文件,可以自己编译。 GOPATH=xxxx/monitor_alert CGO_ENABLED=0 GOOS=linux GOARCH=amd64 go build -o xxx/monitor_alert/bin/PrometheusAlertLinuxAmd64 xxx/monitor_alert/src/PrometheusAlert/PrometheusAlert.go GOPATH=xxxx/monitor_alert CGO_ENABLED=0 GOOS=linux

《万亿级数据库MongoDB集群性能数十倍提升优化实践》核心17问详细解答

好久不见. 提交于 2021-02-20 10:33:09
《万亿级数据库MongoDB集群性能数十倍提升优化实践》核心17问详细解答 说明: 为了更好的理解背景,请提前阅读oschina分享的 《万亿级数据库MongoDB集群性能数十倍提升及机房多活容灾实践》 一文。 本文是2020年深圳Qcon全球软件开发大会 《专题:现代数据架构》 专场 、 dbaplus专场:万亿级数据库MongoDB集群性能优化实践 、mongodb2020年终盛会 分享 后,获得一致好评。本文收集了会后众多mongodb用户提的比较频繁的17个问题,并对每个问题进行了详细解答,一并整理到本文中。 分享内容回顾如下: MongoDB在OPPO互联网推广经验分享-如何把一个淘汰边缘的数据库逐步变为公司主流数据库 谈谈当前国内对MongoDB误解(丢数据、不安全、难维护)? MongoDB跨机房多活方案-实现成本、性能、一致性"三丰收" MongoDB线程模型瓶颈及其优化方法 并行迁移:MongoDB内核扩容迁移速率数倍/数十倍提升优化实践 百万级高并发读写/千亿级数据量MongoDB集群性能数倍提升优化实践 万亿级数据量MongoDB集群性能数十倍提升优化实践 磁盘80%节省-记某服务接口千亿级数据迁移MongoDB,近百台SSD服务器节省原理 关于作者 前滴滴出行技术专家,现任OPPO文档数据库mongodb负责人

Configure basic_auth for Prometheus Target

家住魔仙堡 提交于 2021-02-19 07:07:31
问题 One of the targets in static_configs in my prometheus.yml config file is secured with basic authentication. As a result, an error of description "Connection refused" is always displayed against that target in the Prometheus Targets' page. I have researched how to setup prometheus to provide the security credentials when trying to scrape that particular target but couldn't find any solution. What I found was how to set it up on the scrape_config section in the docs. This won't work for me

Prometheus alert manager doesnt send alert k8s

末鹿安然 提交于 2021-02-18 19:13:33
问题 Im using prometheus operator 0.3.4 and alert manager 0.20 and it doesnt work, i.e. I see that the alert is fired (on prometheus UI on the alerts tab) but I didnt get any alert to the email. by looking at the logs I see the following , any idea ? please see the warn in bold maybe this is the reason but not sure how to fix it... This is the helm of prometheus operator which I use: https://github.com/helm/charts/tree/master/stable/prometheus-operator level=info ts=2019-12-23T15:42:28.039Z caller

Understanding histogram_quantile based on rate in Prometheus

允我心安 提交于 2021-02-18 06:04:44
问题 According to Prometheus documentation in order to have a 95th percentile using histogram metric I can use following query: histogram_quantile(0.95, sum(rate(http_request_duration_seconds_bucket[5m])) by (le)) Source: https://prometheus.io/docs/practices/histograms/#quantiles Since each bucket of histogram is a counter we can calculate rate each of the buckets as: per-second average rate of increase of the time series in the range vector. See: https://prometheus.io/docs/prometheus/latest

Understanding histogram_quantile based on rate in Prometheus

半世苍凉 提交于 2021-02-18 06:04:42
问题 According to Prometheus documentation in order to have a 95th percentile using histogram metric I can use following query: histogram_quantile(0.95, sum(rate(http_request_duration_seconds_bucket[5m])) by (le)) Source: https://prometheus.io/docs/practices/histograms/#quantiles Since each bucket of histogram is a counter we can calculate rate each of the buckets as: per-second average rate of increase of the time series in the range vector. See: https://prometheus.io/docs/prometheus/latest

How to get the quantile of rate in prometheus

試著忘記壹切 提交于 2021-02-17 05:13:43
问题 I am looking at this article # TYPE prometheus_http_request_duration_seconds histogram prometheus_http_request_duration_seconds_bucket{handler="/",le="0.1"} 25547 prometheus_http_request_duration_seconds_bucket{handler="/",le="0.2"} 26688 prometheus_http_request_duration_seconds_bucket{handler="/",le="0.4"} 27760 prometheus_http_request_duration_seconds_bucket{handler="/",le="1"} 28641 prometheus_http_request_duration_seconds_bucket{handler="/",le="3"} 28782 I am confused on why histogram

初试 Open Service Mesh(OSM)

拈花ヽ惹草 提交于 2021-02-17 02:57:19
微软近期开源了一个新的名为 Open Service Mesh [1] 的项目并准备 捐赠给 CNCF [2] 。 基本介绍  Open Service Mesh (OSM) is a lightweight, extensible, Cloud Native service mesh that allows users to uniformly manage, secure, and get out-of-the-box observability features for highly dynamic microservice environments. ” Open Service Mesh(OSM)是一个轻量级,可扩展的云原生服务网格,它使用户能够统一管理,保护和获得针对高度动态微服务环境的开箱即用的可观察性功能。 OSM 在 Kubernetes 上运行基于 Envoy 的控制平面,可以使用 SMI API 进行配置。它通过以 sidecar 的形式注入 Envoy 代理来工作。 控制面负责持续配置代理,以配置策略和路由规则等都保持最新。代理主要负责执行访问控制的规则,路由控制,采集 metrics 等。(这和目前我们常见到的 Service Mesh 方案基本都一样的) 显著特性 基于 Service Mesh Interface (SMI) 的实现,主要包括

Alertmanager 安装(k8s报警)

 ̄綄美尐妖づ 提交于 2021-02-16 23:38:55
一、下载Alertmanager https://prometheus.io/download/ wget https://github.com/prometheus/alertmanager/releases/download/v0.16.0-alpha.0/alertmanager-0.16.0-alpha.0.linux-amd64.tar.gz #解压 tar xf alertmanager-0.16.0-alpha.0.linux-amd64.tar.gz mv alertmanager-0.16.0-alpha.0.linux-amd64 /usr/local/alertmanager #创建数据目录 mkdir -p /data/alertmanager #创建用户 useradd prometheus chown -R prometheus:prometheus /usr/local/alertmanager /data/alertmanager/ #添加启动服务 vim /usr/lib/systemd/system/alertmanager.service [Unit] Description=Alertmanager After=network.target [Service] Type=simple User=prometheus ExecStart=

Dynamically update prometheus scrape config based on pod labels

左心房为你撑大大i 提交于 2021-02-16 14:54:13
问题 I'm trying to enhance my monitoring and want to expand the amount of metrics pulled into Prometheus from our Kube estate. We already have a stand alone Prom implementation which has a hard coded config file monitoring some bare metal servers, and hooks into cadvisor for generic Pod metrics. What i would like to do is configure Kube to monitor the apache_exporter metrics from a webserver deployed in the cluster, but also dynamically add a 2nd, 3rd etc webserver as the instances are scaled up.