Write a program to find 100 largest numbers out of an array of 1 billion numbers

前端 未结 30 2209
深忆病人
深忆病人 2020-11-29 14:04

I recently attended an interview where I was asked \"write a program to find 100 largest numbers out of an array of 1 billion numbers.\"

I was only able to give a br

30条回答
  •  旧时难觅i
    2020-11-29 14:48

     Although in this question we should search for top 100 numbers, I will 
     generalize things and write x. Still, I will treat x as constant value.
    

    Algorithm Biggest x elements from n:

    I will call return value LIST. It is a set of x elements (in my opinion that should be linked list)

    • First x elements are taken from pool "as they come" and sorted in LIST (this is done in constant time since x is treated as constant - O( x log(x) ) time)
    • For every element that comes next we check if it is bigger than smallest element in LIST and if is we pop out the smallest and insert current element to LIST. Since that is ordered list every element should find its place in logarithmic time (binary search) and since it is ordered list insertion is not a problem. Every step is also done in constant time ( O(log(x) ) time ).

    So, what is the worst case scenario?

    x log(x) + (n-x)(log(x)+1) = nlog(x) + n - x

    So that is O(n) time for worst case. The +1 is the checking if number is greater than smallest one in LIST. Expected time for average case will depend on mathematical distribution of those n elements.

    Possible improvements

    This algorithm can be slightly improved for worst case scenario but IMHO (I can not prove this claim) that will degrade average behavior. Asymptotic behavior will be the same.

    Improvement in this algorithm will be that we will not check if element is greater than smallest. For each element we will try to insert it and if it is smaller than smallest we will disregard it. Although that sounds preposterous if we regard only the worst case scenario we will have

    x log(x) + (n-x)log(x) = nlog(x)

    operations.

    For this use case I don't see any further improvements. Yet you must ask yourself - what if I have to do this more than log(n) times and for different x-es? Obviously we would sort that array in O(n log(n)) and take our x element whenever we need them.

提交回复
热议问题