Finding median of large set of numbers too big to fit into memory

前端 未结 7 732
梦如初夏
梦如初夏 2020-12-23 02:12

I was asked this question in an interview recently.

There are N numbers, too many to fit into memory. They are split across k database tables (unsorted), each of whi

7条回答
  •  梦毁少年i
    2020-12-23 02:52

    If an approximate answer is sufficient, a method similar to @piccolbo works well. I'll assume all the points are integers, but if not you can multiply by ten or a hundred or whatever to normalize the data to integers. Make one pass over the data calculating an average (arithmetic mean. Call that number the provisional median. Then make a second pass over the data. If the data point is less than the provisional median, reduce the provisional median by one. If the data point is greater than the provisional median, increase the provisional median by one. If the data point is the same as the provisional median, leave the provisional median unchanged. After the end of the data, return the provisional median. What will happen is that the provisional median will initially change from time to time, but eventually it will stabilize over a very small range, which will be very close to the actual median.

提交回复
热议问题