Expected worst-case time complexity of chained hash table lookups?

这一生的挚爱 提交于 2019-12-05 18:36:39

For fixed α, the expected worst time is always Θ(log n / log log n). However if you make α a function of n, then the expected worst time can change. For instance if α = O(n) then the expected worst time is O(n) (that's the case where you have a fixed number of hash buckets).

In general the distribution of items into buckets is approximately a Poisson distribution, the odds of a random bucket having i items is αi e / i!. The worst case is just the m'th worst out of m close to independent observations. (Not entirely independent, but fairly close to it.) The m'th worst out of m observations tends to be something whose odds of happening are about 1/m times. (More precisely the distribution is given by a Β distribution, but for our analysis 1/m is good enough.)

As you head into the tail of the Poisson distribution the growth of the i! term dominates everything else, so the cumulative probability of everything above a given i is smaller than the probability of selecting i itself. So to a good approximation you can figure out the expected value by solving for:

αi e-α / i! = 1/m = 1/(n/α) = α/n

Take logs of both sides and we get:

i log(α) - α - (i log(i) - i + O(log(i)) = log(α) - log(n)
log(n) - α = i log(i) - i - i log(α) + O(log(i))

If we hold α constant then this is:

log(n) = i log(i) + O(i)

Can this work if i has the form k log(n) / log(log(n)) with k = Θ(1)? Let's try it:

log(n) = (k log(n) / log(log(n))) (log(k) + log(log(n)) - log(log(log(n)))) + O(log(log(n)))
       = k (log(n) + o(log(n)) + o(log(n))

And then we get the sharper estimate that, for any fixed load average α, the expected worst time is (1 + o(1)) log(n) / log(log(n))

After some searching, I came across this research paper that gives a complete analysis of the expected worst-case behavior of a whole bunch of different types of hash tables, including chained hash tables. The author gives as an answer that the expected length is approximately Γ-1(m), where m is the number of buckets and Γ is the Gamma function. Assuming that α is a constant, this is approximately ln m / ln ln m.

Hope this helps!

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!