prometheus-alertmanager

Prometheus alert manager doesnt send alert k8s

末鹿安然 提交于 2021-02-18 19:13:33
问题 Im using prometheus operator 0.3.4 and alert manager 0.20 and it doesnt work, i.e. I see that the alert is fired (on prometheus UI on the alerts tab) but I didnt get any alert to the email. by looking at the logs I see the following , any idea ? please see the warn in bold maybe this is the reason but not sure how to fix it... This is the helm of prometheus operator which I use: https://github.com/helm/charts/tree/master/stable/prometheus-operator level=info ts=2019-12-23T15:42:28.039Z caller

Trying to configure prometheus with alert manager but getting error with rules file

て烟熏妆下的殇ゞ 提交于 2021-01-29 04:52:50
问题 In my prometheus.yml,the rules file is called rules.yml and it has this --- groups: - name: example rules: - alert: ServiceDown expr: up == 0 for: 2m labels: severity: critical annotations: summary: cannot connect to {{ $labels.job }} when i run sudo ./promtool check config rules.yml i get the error Checking rules.yml FAILED: parsing YAML file rules.yml: yaml: unmarshal errors: line 2: field groups not found in type config.plain I am not sure what is wrong as i am following this https:/

sum of rate function in prometheus

三世轮回 提交于 2020-12-27 07:09:37
问题 Given the following prometheus time series called requests : the vector query requests[3 seconds] is : and the rate of the vector query requests[3 seconds], rate(requests[3 sec]) (computed by the shown formula) is : My question is : what is sum(rate(requests[3 sec])) evaluated at seconds 5, 4 and 3 respectively is it 16.5, 6.5 and 1. Any idea? 回答1: You are misunderstanding the purpose of sum. It is not performing a sum over time but over the dimensions for your metric. In your example,

sum of rate function in prometheus

余生颓废 提交于 2020-12-27 07:03:30
问题 Given the following prometheus time series called requests : the vector query requests[3 seconds] is : and the rate of the vector query requests[3 seconds], rate(requests[3 sec]) (computed by the shown formula) is : My question is : what is sum(rate(requests[3 sec])) evaluated at seconds 5, 4 and 3 respectively is it 16.5, 6.5 and 1. Any idea? 回答1: You are misunderstanding the purpose of sum. It is not performing a sum over time but over the dimensions for your metric. In your example,

sum of rate function in prometheus

最后都变了- 提交于 2020-12-27 07:02:06
问题 Given the following prometheus time series called requests : the vector query requests[3 seconds] is : and the rate of the vector query requests[3 seconds], rate(requests[3 sec]) (computed by the shown formula) is : My question is : what is sum(rate(requests[3 sec])) evaluated at seconds 5, 4 and 3 respectively is it 16.5, 6.5 and 1. Any idea? 回答1: You are misunderstanding the purpose of sum. It is not performing a sum over time but over the dimensions for your metric. In your example,

How to snooze prometheus alert for specific time

北战南征 提交于 2020-12-12 11:43:46
问题 I have faced some issues with Prometheus memory alert. If I take the backup of Gitlab then memory usage going up to 95%. I want to snooze memory alert for a specific time. e.g. If I am taking a backup at 2 AM then I need to snooze Prometheus memory alert. Is it possible? 回答1: As Marcelo said, there is no way to schedule a silence but if the backup is made at regular interval (say every night from 2am to 3am), you can include that in the alert expression. - alert: OutOfMemory expr: node_memory