I am currently working on an algorithm to implement a rolling median filter (analogous to a rolling mean filter) in C. From my search of the literature, there appear to be t
It is maybe worth pointing out that there is a special case which has a simple exact solution: when all the values in the stream are integers within a (relatively) small defined range. For instance, assume they must all lie between 0 and 1023. In this case just define an array of 1024 elements and a count, and clear all of these values. For each value in the stream increment the corresponding bin and the count. After the stream ends find the bin which contains the count/2 highest value - easily accomplished by adding successive bins starting from 0. Using the same method the value of an arbitrary rank order may be found. (There is a minor complication if detecting bin saturation and "upgrading" the size of the storage bins to a larger type during a run will be needed.)
This special case may seem artificial, but in practice it is very common. It can also be applied as an approximation for real numbers if they lie within a range and a "good enough" level of precision is known. This would hold for pretty much any set of measurements on a group of "real world" objects. For instance, the heights or weights of a group of people. Not a big enough set? It would work just as well for the lengths or weights of all the (individual) bacteria on the planet - assuming somebody could supply the data!
It looks like I misread the original - which seems like it wants a sliding window median instead of the just the median of a very long stream. This approach still works for that. Load the first N stream values for the initial window, then for the N+1th stream value increment the corresponding bin while decrementing the bin corresponding to the 0th stream value. It is necessary in this case to retain the last N values to allow the decrement, which can be done efficiently by cyclically addressing an array of size N. Since the position of the median can only change by -2,-1,0,1,2 on each step of the sliding window, it isn't necessary to sum all the bins up to the median on each step, just adjust the "median pointer" depending upon which side(s) bins were modified. For instance, if both the new value and the one being removed fall below the current median then it doesn't change (offset = 0). The method breaks down when N becomes too large to hold conveniently in memory.