I was asked this question in an interview recently.
There are N numbers, too many to fit into memory. They are split across k database tables (unsorted), each of whi
Here is what I would do:
Sample the data to get a general idea about the distribution.
Using the information about the distribution, choose a "bucket" (a range), large enough to get the median inside and small enough to fit into the memory.
With one pass (O(N)) count the numbers before the bucket (L1_size), after the bucket (L3_size) and put numbers within the range into the bucket (L2). You will see if the chosen bucket contains the median. If not - go to step 2.
Use quickselect or other method to find the k=(L1_size + L2_size/2) element in the bucket.
Requires O(N) + O(L2_size) steps.