How to calculate or approximate the median of a list without storing the list

后端 未结 10 1169
你的背包
你的背包 2020-11-28 22:24

I\'m trying to calculate the median of a set of values, but I don\'t want to store all the values as that could blow memory requirements. Is there a way of calculating or ap

10条回答
  •  南方客
    南方客 (楼主)
    2020-11-28 22:32

    Usually if the input is within a certain range, say 1 to 1 million, it's easy to create an array of counts: read the code for "quantile" and "ibucket" here: http://code.google.com/p/ea-utils/source/browse/trunk/clipper/sam-stats.cpp

    This solution can be generalized as an approximation by coercing the input into an integer within some range using a function that you then reverse on the way out: IE: foo.push((int) input/1000000) and quantile(foo)*1000000.

    If your input is an arbitrary double precision number, then you've got to autoscale your histogram as values come in that are out of range (see above).

    Or you can use the median-triplets method described in this paper: http://web.cs.wpi.edu/~hofri/medsel.pdf

提交回复
热议问题