Retrieving the top 100 numbers from one hundred million of numbers

前端未结

关注

 12  2145

北荒

One of my friend has been asked with a question

Retrieving the max top 100 numbers from one hundred million of numbers

in a rece

相关标签:

12条回答

感情败类

2020-11-30 20:09
By TOP 100, do you mean 100 largest? If so:
```
SELECT TOP 100 Number FROM RidiculouslyLargeTable ORDER BY Number DESC
```
Make sure you tell the interviewer that you assume the table is indexed properly.
0 讨论(0)
发布评论:

提交评论
- 加载中...

我寻月下人不归

2020-11-30 20:09

int numbers[100000000000] = {...};
int result[100] = {0};
for( int i = 0 ; i < 100000000000 ; i++ )
{
    for( int j = 0 ; j < 100 ; j++ )
    {
         if( numbers[i] > result[j] )
         {
              if( j < 99 )
              {
                  memcpy(result+j+1, result+j, (100-j)*sizeof(int));
              }
              result[j] = numbers[i];
              break;
         }
    }
}

0 讨论(0)

野的像风

2020-11-30 20:11

Mergesort in batches of 100, then only keep the top 100.

Incidentally, you can scale this in all sorts of directions, including concurrently.

0 讨论(0)
发布评论:

提交评论
- 加载中...

时光说笑

2020-11-30 20:11

@darius can actually be improved !!!
By "pruning" or deferring the heap-replace operation as required

Suppose we have a=1000 at the top of the heap
It has c,b siblings
We know that c,b>1000

      a=1000
  +-----|-----+
 b>a         c>a




We now read the next number x=1035
Since x>a we should discard a.
Instead we store (x=1035, a=1000) at the root
We do not (yet) bubble down the new value of 1035 
Note that we still know that b,c<a but possibly b,c>x
Now, we get the next number y
when y<a<x then obviously we can discard it 

when y>x>a then we replace x with y (the root now has (y, a=1000))
=> we saved log(m) steps here, since x will never have to bubble down

when a>y>x then we need to bubble down y recursively as required

Worst run time is still O(n log m) 
But average run time i think might be O(n log log m) or something
In any case, it is obviously a faster implementation

0 讨论(0)

长情又很酷

2020-11-30 20:12
I store first 100 numbers in Max -Heap of size 100.
- At last level ,I keep track of minimum number and new number I insert and check with min number.Whether incoming number is candidate for top 100.
  
  -- Again I call reheapify so I always have max heap of top 100.
  
  So its complexity is O(nlogn).
0 讨论(0)
发布评论:

提交评论
- 加载中...
逝去的感伤

2020-11-30 20:17
Ok, here is a really stupid answer, but it is a valid one:
- Load all 100 million entries into an array
- Call some quick sort implementation on it
- Take last 100 items (it sorts ascending), or first 100 if you can sort descending.
Reasoning:
- There is no context on the question, so efficiency can be argued - what IS efficient? Computer time or programmer time?
- This method is implementable very fast.
- 100 million entries - numbers, are just a couple of hundred mb, so every decent workstaiton can simply run that.
It is an ok solution for some sort of one time operation. It would suck running it x times per second or something. But then, we need more context - as mclientk also had with his simple SQL statement - assuming 100 million numbersdo not exist in memory is a feasible question, because... they may come from a database and most of the time will, when talking about business relevant numbers.

As such, the question is really hard to answer - efficiency first has to be defined.
0 讨论(0)
发布评论:

提交评论
- 加载中...

上一页 1 2