Finding median of large set of numbers too big to fit into memory

前端 未结 7 746
梦如初夏
梦如初夏 2020-12-23 02:12

I was asked this question in an interview recently.

There are N numbers, too many to fit into memory. They are split across k database tables (unsorted), each of whi

7条回答
  •  猫巷女王i
    2020-12-23 03:01

    Another way to look at this is to go back to the definition of "median." Authors vary in their language, but basically the median is the value which splits a probability distribution into two equal parts.

    So instead of spending a lot of effort sorting enormous data sets, estimate the distribution and find the middle. As noted above for some distributions the median equals the mean, which is quick and easy to compute. Also, if an exact answer isn't necessary you can use the empirical relationship: mean - mode = 3 * (mean - median).

提交回复
热议问题