Bloom Filter: evaluating false positive rate

不想你离开。 提交于 2019-12-23 05:23:35

问题


Given a fixed number of bits (eg. slot) (m) and a fixed number of hash function (k), how one compute the theoretical false positive rate (p) ?

According to Wikipedia http://en.wikipedia.org/wiki/Bloom_filter, for a false positive rate (p) and a number of item (n), the number of bits (m) needed is given by m = - n * l(p) / (l(2)^2) and the optimal number of hash function (k) is given by k = m / n * l(2).

From the formula given in Wikipedia page, I guess I could evaluate the theoretical false positive rate (p) by the following: p = (1 - e(-(k * n/m)))^k

But Wikipedia has another formula for (p) : p = e(-m/n*(l(2)^2)) which, I suppose, assume that (k) is the optimal number of hash function.

For my example, I took n = 1000000 and m = n * 2, the optimal value for (k) would be 1.386, and the theoretical false positive rate (p) would be 0.382 according the previous formula. Let's choose the number of function, compute the theoretical false positive rate (p) given a fixed (k) and compute the theoretical number of bits needed (m'):

for k = 1, p = .393 and m' = 1941401
for k = 2, p = .399 and m' = 1909344
for k = 3, p = .469 and m' = 1576527
for k = 4, p = .559 and m' = 1210636

The more bits are stuffed in the filter, the more false positive we get. Seems logical.

But could one confirm that formula p = (1 - e(-(k * n/m)))^k is correct to get the theoretical false positive rate given a fixed (k),(m) and (n) ?

Note: the question seems already asked here: With fixed number of functions, how can I calculate the size of a Bloom Filter given the probability of false positives? but there's no answer that match my exact question. How many hash functions does my bloom filter need? might be of interest, but again it's not exactly the same.

Regards


回答1:


m – number of elements in bit array n – number of items in collection p – false positive probability // 0.0 – 1.0 ^ – power

p = e^(-(m/n) * (ln(2)^2));

I wrote a math friendly tutorial on Bloom Filters : http://techeffigy.wordpress.com/2014/06/05/bloom-filter-tutorial/



来源:https://stackoverflow.com/questions/15952524/bloom-filter-evaluating-false-positive-rate

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!