发表新帖

发表新帖

Finding median of large set of numbers too big to fit into memory

前端未结

关注

 7  735

梦如初夏 2020-12-23 02:12

I was asked this question in an interview recently.

There are N numbers, too many to fit into memory. They are split across k database tables (unsorted), each of whi

7条回答

Happy的楠姐 (楼主)

2020-12-23 03:07
Here is what I would do:
1. Sample the data to get a general idea about the distribution.
2. Using the information about the distribution, choose a "bucket" (a range), large enough to get the median inside and small enough to fit into the memory.
3. With one pass (O(N)) count the numbers before the bucket (L1_size), after the bucket (L3_size) and put numbers within the range into the bucket (L2). You will see if the chosen bucket contains the median. If not - go to step 2.
4. Use quickselect or other method to find the k=(L1_size + L2_size/2) element in the bucket.
Requires O(N) + O(L2_size) steps.
0 讨论(0)

查看其它7个回答
发布评论:

提交评论
- 加载中...

热议问题