Write a program to find 100 largest numbers out of an array of 1 billion numbers

前端 未结 30 2125
深忆病人
深忆病人 2020-11-29 14:04

I recently attended an interview where I was asked \"write a program to find 100 largest numbers out of an array of 1 billion numbers.\"

I was only able to give a br

相关标签:
30条回答
  • 2020-11-29 14:44

    First take 1000 elements and add them in a max heap. Now take out the first max 100 elements and store it somewhere. Now pick next 900 elements from the file and add them in the heap along with the last 100 highest element.

    Keep repeating this process of picking up 100 elements from the heap and adding 900 elements from the file.

    The final pick of 100 elements will give us the maximum 100 elements from a billion of numbers.

    0 讨论(0)
  • 2020-11-29 14:45

    take the first 100 numbers of the billion and sort them. now just iterate through the billion, if the source number is higher than the smallest of 100, insert in sort order. What you end up with is something much closer to O(n) over the size of the set.

    0 讨论(0)
  • 2020-11-29 14:45

    An very easy solution would be to iterate through the array 100 times. Which is O(n).

    Each time you pull out the largest number (and change its value to the minimum value, so that you don't see it in the next iteration, or keep track of indexes of previous answers (by keeping track of indexes the original array can have multiple of the same number)). After 100 iterations, you have the 100 largest numbers.

    0 讨论(0)
  • 2020-11-29 14:46

    Managing a separate list is extra work and you have to move things around the whole list every time you find another replacement. Just qsort it and take the top 100.

    0 讨论(0)
  • 2020-11-29 14:48
     Although in this question we should search for top 100 numbers, I will 
     generalize things and write x. Still, I will treat x as constant value.
    

    Algorithm Biggest x elements from n:

    I will call return value LIST. It is a set of x elements (in my opinion that should be linked list)

    • First x elements are taken from pool "as they come" and sorted in LIST (this is done in constant time since x is treated as constant - O( x log(x) ) time)
    • For every element that comes next we check if it is bigger than smallest element in LIST and if is we pop out the smallest and insert current element to LIST. Since that is ordered list every element should find its place in logarithmic time (binary search) and since it is ordered list insertion is not a problem. Every step is also done in constant time ( O(log(x) ) time ).

    So, what is the worst case scenario?

    x log(x) + (n-x)(log(x)+1) = nlog(x) + n - x

    So that is O(n) time for worst case. The +1 is the checking if number is greater than smallest one in LIST. Expected time for average case will depend on mathematical distribution of those n elements.

    Possible improvements

    This algorithm can be slightly improved for worst case scenario but IMHO (I can not prove this claim) that will degrade average behavior. Asymptotic behavior will be the same.

    Improvement in this algorithm will be that we will not check if element is greater than smallest. For each element we will try to insert it and if it is smaller than smallest we will disregard it. Although that sounds preposterous if we regard only the worst case scenario we will have

    x log(x) + (n-x)log(x) = nlog(x)

    operations.

    For this use case I don't see any further improvements. Yet you must ask yourself - what if I have to do this more than log(n) times and for different x-es? Obviously we would sort that array in O(n log(n)) and take our x element whenever we need them.

    0 讨论(0)
  • 2020-11-29 14:48

    Finding the top 100 out of a billion numbers is best done using min-heap of 100 elements.

    First prime the min-heap with the first 100 numbers encountered. min-heap will store the smallest of the first 100 numbers at the root (top).

    Now as you go along the rest of the numbers only compare them with the root (smallest of the 100).

    If the new number encountered is larger than root of min-heap replace the root with that number otherwise ignore it.

    As part of the insertion of the new number in min-heap the smallest number in the heap will come to the top (root).

    Once we have gone through all the numbers we will have the largest 100 numbers in the min-heap.

    0 讨论(0)
提交回复
热议问题