Store the largest 5000 numbers from a stream of numbers

爱⌒轻易说出口 提交于 2019-11-26 13:22:25
amit

The simplest solution for this is maintaining a min heap of max size 5000.

  • Every time a new number arrives - check if the heap is smaller then 5000, if it is - add it.
  • If it is not - check if the minimum is smaller then the new element, and if it is, pop it out and insert the new element instead.
  • When you are done - you have a heap containing 5000 largest elements.

This solution is O(nlogk) complexity, where n is the number of elements and k is the number of elements you need (5000 in your case).

It can be done also in O(n) using selection algorithm - store all the elements, and then find the 5001th largest element, and return everything bigger than it. But it is harder to implement and for reasonable size input - might not be better. Also, if stream contains duplicates, more processing is needed.

Use a (minimum) priority queue. Add each incoming item to the queue and when the size reaches 5,000 remove the minimum (top) element every time you add an incoming element. The queue will contain the 5,000 largest elements and when the input stops, just remove the contents. This MinPQ is also called a heap but that is an overloaded term. Insertions and deletions take about log2(N). Where N maxes out at 5,000 this would be just over 12 [log2(4096) = 12] times the number of items you are processing.

An excellent source of info is Algorithms, (4th Edition) by Robert Sedgewick and Kevin Wayne. There is an excellent MOOC on coursera.org that is based on this text.

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!