Finding median of large set of numbers too big to fit into memory

前端 未结 7 730
梦如初夏
梦如初夏 2020-12-23 02:12

I was asked this question in an interview recently.

There are N numbers, too many to fit into memory. They are split across k database tables (unsorted), each of whi

相关标签:
7条回答
  • 2020-12-23 03:11

    There's a few potential solutions:

    • External merge sort - O(n log n)
      You basically sort the numbers on the first pass, then find the median on the second.
    • Order statistics distributed selection algorithm - O(n)
      Simplify the problem to the original problem of finding the kth number in an unsorted array.
    • Counting sort histogram O(n)
      You have to assume some properties about the range of the numbers - can the range fit in the memory?
    • If anything is known about the distribution of the numbers other algorithms can be produced.

    For more details and implementation see:
    http://www.fusu.us/2013/07/median-in-large-set-across-1000-servers.html

    0 讨论(0)
提交回复
热议问题