I\'ve written this implementation of the median of medians algorithm in python, but it doesn\'t seem to output the right result, and it also does not seem of linear complexi
Below is my PYTHON implementation. For more speed, you might want to use PYPY instead.
For your question about SPEED: The theoretical speed for 5 numbers per column is ~10N, so I use 15 numbers per column, for a 2X speed at ~5N, while the optimal speed is ~4N. But, I could be wrong about the optimal speed of the most state-of-art solution. In my own test, my program runs slightly faster than the one using sort(). Certainly, your mileage may vary.
Assuming the python program is "median.py", an example to run it is "python ./median.py 100". For speed benchmark, you might want to comment out the validation code, and use PYPY.
#!/bin/python
#
# TH @stackoverflow, 2016-01-20, linear time "median of medians" algorithm
#
import sys, random
items_per_column = 15
def find_i_th_smallest( A, i ):
t = len(A)
if(t <= items_per_column):
# if A is a small list with less than items_per_column items, then:
# 1. do sort on A
# 2. return the i-th smallest item of A
#
return sorted(A)[i]
else:
# 1. partition A into columns of items_per_column items each. items_per_column is odd, say 15.
# 2. find the median of every column
# 3. put all medians in a new list, say, B
#
B = [ find_i_th_smallest(k, (len(k) - 1)/2) for k in [A[j:(j + items_per_column)] for j in range(0,len(A),items_per_column)]]
# 4. find M, the median of B
#
M = find_i_th_smallest(B, (len(B) - 1)/2)
# 5. split A into 3 parts by M, { < M }, { == M }, and { > M }
# 6. find which above set has A's i-th smallest, recursively.
#
P1 = [ j for j in A if j < M ]
if(i < len(P1)):
return find_i_th_smallest( P1, i)
P3 = [ j for j in A if j > M ]
L3 = len(P3)
if(i < (t - L3)):
return M
return find_i_th_smallest( P3, i - (t - L3))
# How many numbers should be randomly generated for testing?
#
number_of_numbers = int(sys.argv[1])
# create a list of random positive integers
#
L = [ random.randint(0, number_of_numbers) for i in range(0, number_of_numbers) ]
# Show the original list
#
print L
# This is for validation
#
print sorted(L)[int((len(L) - 1)/2)]
# This is the result of the "median of medians" function.
# Its result should be the same as the validation.
#
print find_i_th_smallest( L, (len(L) - 1) / 2)