How to get the quantile of rate in prometheus

試著忘記壹切 提交于 2021-02-17 05:13:43

问题


I am looking at this article

# TYPE prometheus_http_request_duration_seconds histogram
prometheus_http_request_duration_seconds_bucket{handler="/",le="0.1"} 25547
prometheus_http_request_duration_seconds_bucket{handler="/",le="0.2"} 26688
prometheus_http_request_duration_seconds_bucket{handler="/",le="0.4"} 27760
prometheus_http_request_duration_seconds_bucket{handler="/",le="1"} 28641
prometheus_http_request_duration_seconds_bucket{handler="/",le="3"} 28782

I am confused on why

histogram_quantile(0.9, 
    rate(prometheus_http_request_duration_seconds_bucket[5m])
)

doesn't give you the quantile of rate with unit observe event / second but instead give the quantile of request duration with unit second / observe event

rate(prometheus_http_request_duration_seconds_bucket[5m]

should give you number of observe event in certain bucket / second average over 5 minute

I would imagine histogram_quantile would then give you the rate quantiles

I must be understanding something incorrectly


回答1:


The rate() function is here to specify the time windows for the quantile calculation as indicated in the histogram_quantile() function. It translates as "over the last 5 minutes, what is the maximum http response time experienced by 90% of my users ?"

The histogram_quantile() function interpolates quantile values by assuming a linear distribution within a bucket, le giving the max time of observation. A bucket is a counter measuring the number of occurrence of observation since the start of the process. rate() makes the link by computing the number of occurrence of observations per second (on average) from which can be interpolated the response time (on average) over the time window.

You are right that it is not a 100% accurate measure because of the average but the function is making a lot of assumptions and the choice of buckets is already introducing bias.

I guess you could use irate() to compute the instantaneous quantiles but chances are it would be more noisy.




回答2:


This and here is the code for the historgram_quantile in prometheus.

Take an example,

assumed the original bucket is :
[50][100][150][200][200] with corresponding upperbound 5s,10s,15s,20s,+Inf.

then the rate(xx[5m]) returned a bucket like this:
[20/5*60][40/5*60][60/5*60][80/5*60][80/5*60]

histogram_quantile will delegate the returned bucket to another function bucketQuantile.
It used the rough following logic to compute the percentile: 

1) get the total rank of the percentile 
such as 90ile is 0.9 * total counts = 0.9 * (80/5*60)
2) compute the value of 90ile
last upperbound before the total rank position is 15 secs;
current upperbound of the total rank is 20 secs;
the count in the bucket that 90ile position belongs is (80/5*60)-(60/5*60);
the internal rank in that single bucket of 90ile position is (0.9 * 80/5*60)-(60/5*60);
finally, the value of 90ile is: 15 sec + (internal rank / that bucket count) * (20sec-15sec) = 15 + 3 * ( (0.9 * 80/5*60)-(60/5*60) / (80/5*60)-(60/5*60) ) = 
15 + 3 * ( (0.9*80 - 60)/(80-60) ) = 15 + 3 * ( 12/20) = 15+3*0.6= 16.8 sec

That's it, you can see the denominator 5*60 is actually no effect in the computation. so the rate() func is just lent to specify the time window 5 minutes.



来源:https://stackoverflow.com/questions/60962520/how-to-get-the-quantile-of-rate-in-prometheus

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!