Retrieving the top 100 numbers from one hundred million of numbers

前端 未结 12 2145
北荒
北荒 2020-11-30 19:36

One of my friend has been asked with a question

Retrieving the max top 100 numbers from one hundred million of numbers

in a rece

相关标签:
12条回答
  • 2020-11-30 20:09

    By TOP 100, do you mean 100 largest? If so:

    SELECT TOP 100 Number FROM RidiculouslyLargeTable ORDER BY Number DESC
    

    Make sure you tell the interviewer that you assume the table is indexed properly.

    0 讨论(0)
  • 2020-11-30 20:09
    int numbers[100000000000] = {...};
    int result[100] = {0};
    for( int i = 0 ; i < 100000000000 ; i++ )
    {
        for( int j = 0 ; j < 100 ; j++ )
        {
             if( numbers[i] > result[j] )
             {
                  if( j < 99 )
                  {
                      memcpy(result+j+1, result+j, (100-j)*sizeof(int));
                  }
                  result[j] = numbers[i];
                  break;
             }
        }
    }
    
    0 讨论(0)
  • 2020-11-30 20:11

    Mergesort in batches of 100, then only keep the top 100.

    Incidentally, you can scale this in all sorts of directions, including concurrently.

    0 讨论(0)
  • 2020-11-30 20:11

    @darius can actually be improved !!!
    By "pruning" or deferring the heap-replace operation as required

    Suppose we have a=1000 at the top of the heap
    It has c,b siblings
    We know that c,b>1000

          a=1000
      +-----|-----+
     b>a         c>a
    
    
    
    
    We now read the next number x=1035
    Since x>a we should discard a.
    Instead we store (x=1035, a=1000) at the root
    We do not (yet) bubble down the new value of 1035 
    Note that we still know that b,c<a but possibly b,c>x
    Now, we get the next number y
    when y<a<x then obviously we can discard it 
    
    when y>x>a then we replace x with y (the root now has (y, a=1000))
    => we saved log(m) steps here, since x will never have to bubble down
    
    when a>y>x then we need to bubble down y recursively as required
    
    Worst run time is still O(n log m) 
    But average run time i think might be O(n log log m) or something
    In any case, it is obviously a faster implementation
    
    0 讨论(0)
  • 2020-11-30 20:12

    I store first 100 numbers in Max -Heap of size 100.

    • At last level ,I keep track of minimum number and new number I insert and check with min number.Whether incoming number is candidate for top 100.

      -- Again I call reheapify so I always have max heap of top 100.

      So its complexity is O(nlogn).

    0 讨论(0)
  • 2020-11-30 20:17

    Ok, here is a really stupid answer, but it is a valid one:

    • Load all 100 million entries into an array
    • Call some quick sort implementation on it
    • Take last 100 items (it sorts ascending), or first 100 if you can sort descending.

    Reasoning:

    • There is no context on the question, so efficiency can be argued - what IS efficient? Computer time or programmer time?
    • This method is implementable very fast.
    • 100 million entries - numbers, are just a couple of hundred mb, so every decent workstaiton can simply run that.

    It is an ok solution for some sort of one time operation. It would suck running it x times per second or something. But then, we need more context - as mclientk also had with his simple SQL statement - assuming 100 million numbersdo not exist in memory is a feasible question, because... they may come from a database and most of the time will, when talking about business relevant numbers.

    As such, the question is really hard to answer - efficiency first has to be defined.

    0 讨论(0)
提交回复
热议问题