Python implementation of “median of medians” algorithm

前端 未结 2 1632
爱一瞬间的悲伤
爱一瞬间的悲伤 2020-12-06 08:49

I\'ve written this implementation of the median of medians algorithm in python, but it doesn\'t seem to output the right result, and it also does not seem of linear complexi

相关标签:
2条回答
  • 2020-12-06 09:19

    Below is my PYTHON implementation. For more speed, you might want to use PYPY instead.

    For your question about SPEED: The theoretical speed for 5 numbers per column is ~10N, so I use 15 numbers per column, for a 2X speed at ~5N, while the optimal speed is ~4N. But, I could be wrong about the optimal speed of the most state-of-art solution. In my own test, my program runs slightly faster than the one using sort(). Certainly, your mileage may vary.

    Assuming the python program is "median.py", an example to run it is "python ./median.py 100". For speed benchmark, you might want to comment out the validation code, and use PYPY.

    #!/bin/python
    #
    # TH @stackoverflow, 2016-01-20, linear time "median of medians" algorithm
    #
    import sys, random
    
    
    items_per_column = 15
    
    
    def find_i_th_smallest( A, i ):
        t = len(A)
        if(t <= items_per_column):
            # if A is a small list with less than items_per_column items, then:
            #     1. do sort on A
            #     2. return the i-th smallest item of A
            #
            return sorted(A)[i]
        else:
            # 1. partition A into columns of items_per_column items each. items_per_column is odd, say 15.
            # 2. find the median of every column
            # 3. put all medians in a new list, say, B
            #
            B = [ find_i_th_smallest(k, (len(k) - 1)/2) for k in [A[j:(j + items_per_column)] for j in range(0,len(A),items_per_column)]]
    
            # 4. find M, the median of B
            #
            M = find_i_th_smallest(B, (len(B) - 1)/2)
    
            # 5. split A into 3 parts by M, { < M }, { == M }, and { > M }
            # 6. find which above set has A's i-th smallest, recursively.
            #
            P1 = [ j for j in A if j < M ]
            if(i < len(P1)):
                return find_i_th_smallest( P1, i)
            P3 = [ j for j in A if j > M ]
            L3 = len(P3)
            if(i < (t - L3)):
                return M
            return find_i_th_smallest( P3, i - (t - L3))
    
    
    # How many numbers should be randomly generated for testing?
    #
    number_of_numbers = int(sys.argv[1])
    
    
    # create a list of random positive integers
    #
    L = [ random.randint(0, number_of_numbers) for i in range(0, number_of_numbers) ]
    
    
    # Show the original list
    #
    print L
    
    
    # This is for validation
    #
    print sorted(L)[int((len(L) - 1)/2)]
    
    
    # This is the result of the "median of medians" function.
    # Its result should be the same as the validation.
    #
    print find_i_th_smallest( L, (len(L) - 1) / 2)
    
    0 讨论(0)
  • 2020-12-06 09:20

    1) Your code indentation is wrong, try this:

    def select(L):
        if len(L) < 10:
            L.sort()
            return L[int(len(L)/2)]
        S = []
        lIndex = 0
        while lIndex+5 < len(L)-1:
            S.append(L[lIndex:lIndex+5])
            lIndex += 5
        S.append(L[lIndex:])
        Meds = []
        for subList in S:
            print(subList)
            Meds.append(select(subList))
        L2 = select(Meds)
        L1 = L3 = []
        for i in L:
            if i < L2:
                L1.append(i)
            if i > L2:
                L3.append(i)
        if len(L) < len(L1):
            return select(L1)
        elif len(L) > len(L1) + 1:
            return select(L3)
        else:
            return L2
    

    2) The method you use does not return the median, it just return a number which is not so far from the median. To get the median, you need to count how many number are greater than your pseudo-median, if a majority is greater, repeat the algorithm with the numbers greater than the pseudo-median, else repeat with the other numbers.

    def select(L, j):
        if len(L) < 10:
            L.sort()
            return L[j]
        S = []
        lIndex = 0
        while lIndex+5 < len(L)-1:
            S.append(L[lIndex:lIndex+5])
            lIndex += 5
        S.append(L[lIndex:])
        Meds = []
        for subList in S:
            Meds.append(select(subList, int((len(subList)-1)/2)))
        med = select(Meds, int((len(Meds)-1)/2))
        L1 = []
        L2 = []
        L3 = []
        for i in L:
            if i < med:
                L1.append(i)
            elif i > med:
                L3.append(i)
            else:
                L2.append(i)
        if j < len(L1):
            return select(L1, j)
        elif j < len(L2) + len(L1):
            return L2[0]
        else:
            return select(L3, j-len(L1)-len(L2))
    

    Warning: L = M = [] is not L = [] and M = []

    0 讨论(0)
提交回复
热议问题