Finding median of large set of numbers too big to fit into memory

前端 未结 7 735
梦如初夏
梦如初夏 2020-12-23 02:12

I was asked this question in an interview recently.

There are N numbers, too many to fit into memory. They are split across k database tables (unsorted), each of whi

7条回答
  •  Happy的楠姐
    2020-12-23 03:07

    Here is what I would do:

    1. Sample the data to get a general idea about the distribution.

    2. Using the information about the distribution, choose a "bucket" (a range), large enough to get the median inside and small enough to fit into the memory.

    3. With one pass (O(N)) count the numbers before the bucket (L1_size), after the bucket (L3_size) and put numbers within the range into the bucket (L2). You will see if the chosen bucket contains the median. If not - go to step 2.

    4. Use quickselect or other method to find the k=(L1_size + L2_size/2) element in the bucket.

    Requires O(N) + O(L2_size) steps.

提交回复
热议问题