Why is my MergeSort so slow in Python?

前端 未结 4 1931
春和景丽
春和景丽 2021-01-14 15:20

I\'m having some troubles understanding this behaviour. I\'m measuring the execution time with the timeit-module and get the following results for 10000 cyc

4条回答
  •  猫巷女王i
    2021-01-14 15:54

    For starters : I cannot reproduce your timing results, on 100 cycles and lists of size 10000. The exhaustive benchmark with timeit of all implementations discussed in this answer (including bubblesort and your original snippet) is posted as a gist here. I find the following results for the average duration of a single run :

    • Python's native (Tim)sort : 0.0144600081444
    • Bubblesort : 26.9620819092
    • (Your) Original Mergesort : 0.224888720512

    Now, to make your function faster, you can do a few things.

    • Edit : Well, apparently, I was wrong on that one (thanks cwillu). Length computation takes O(1) in python. But removing useless computation everywhere still improves things a bit (Original Mergesort: 0.224888720512, no-length Mergesort: 0.195795390606):

      def nolenmerge(array1,array2):
          merged_array=[]
          while array1 or array2:
              if not array1:
                  merged_array.append(array2.pop(0))
              elif (not array2) or array1[0] < array2[0]:
                  merged_array.append(array1.pop(0))
              else:
                  merged_array.append(array2.pop(0))
          return merged_array
      
      def nolenmergeSort(array):
          n  = len(array)
          if n <= 1:
              return array
          left = array[:n/2]
          right = array[n/2:]
          return nolenmerge(nolenmergeSort(left),nolenmergeSort(right))
      
    • Second, as suggested in this answer, pop(0) is linear. Rewrite your merge to pop() at the end:

      def fastmerge(array1,array2):
          merged_array=[]
          while array1 or array2:
              if not array1:
                  merged_array.append(array2.pop())
              elif (not array2) or array1[-1] > array2[-1]:
                  merged_array.append(array1.pop())
              else:
                  merged_array.append(array2.pop())
          merged_array.reverse()
          return merged_array
      

      This is again faster: no-len Mergesort: 0.195795390606, no-len Mergesort+fastmerge: 0.126505711079

    • Third - and this would only be useful as-is if you were using a language that does tail call optimization, without it , it's a bad idea - your call to merge to merge is not tail-recursive; it calls both (mergeSort left) and (mergeSort right) recursively while there is remaining work in the call (merge).

      But you can make the merge tail-recursive by using CPS (this will run out of stack size for even modest lists if you don't do tco):

      def cps_merge_sort(array):
          return cpsmergeSort(array,lambda x:x)
      
      def cpsmergeSort(array,continuation):
          n  = len(array)
          if n <= 1:
              return continuation(array)
          left = array[:n/2]
          right = array[n/2:]
          return cpsmergeSort (left, lambda leftR:
                               cpsmergeSort(right, lambda rightR:
                                            continuation(fastmerge(leftR,rightR))))
      

      Once this is done, you can do TCO by hand to defer the call stack management done by recursion to the while loop of a normal function (trampolining, explained e.g. here, trick originally due to Guy Steele). Trampolining and CPS work great together.

      You write a thunking function, that "records" and delays application: it takes a function and its arguments, and returns a function that returns (that original function applied to those arguments).

      thunk = lambda name, *args: lambda: name(*args)
      

      You then write a trampoline that manages calls to thunks: it applies a thunk until the thunk returns a result (as opposed to another thunk)

      def trampoline(bouncer):
          while callable(bouncer):
              bouncer = bouncer()
          return bouncer
      

      Then all that's left is to "freeze" (thunk) all your recursive calls from the original CPS function, to let the trampoline unwrap them in proper sequence. Your function now returns a thunk, without recursion (and discarding its own frame), at every call:

      def tco_cpsmergeSort(array,continuation):
          n  = len(array)
          if n <= 1:
              return continuation(array)
          left = array[:n/2]
          right = array[n/2:]
          return thunk (tco_cpsmergeSort, left, lambda leftR:
                        thunk (tco_cpsmergeSort, right, lambda rightR:
                               (continuation(fastmerge(leftR,rightR)))))
      
      mycpomergesort = lambda l: trampoline(tco_cpsmergeSort(l,lambda x:x))
      

    Sadly this does not go that fast (recursive mergesort:0.126505711079, this trampolined version : 0.170638551712). OK, I guess the stack blowup of the recursive merge sort algorithm is in fact modest : as soon as you get out of the leftmost path in the array-slicing recursion pattern, the algorithm starts returning (& removing frames). So for 10K-sized lists, you get a function stack of at most log_2(10 000) = 14 ... pretty modest.

    You can do slightly more involved stack-based TCO elimination in the guise of this SO answer gives:

        def leftcomb(l):
            maxn,leftcomb = len(l),[]
            n = maxn/2
            while maxn > 1:
                leftcomb.append((l[n:maxn],False))
                maxn,n = n,n/2
            return l[:maxn],leftcomb
    
        def tcomergesort(l):
            l,stack = leftcomb(l)
            while stack: # l sorted, stack contains tagged slices
                i,ordered = stack.pop()
                if ordered:
                    l = fastmerge(l,i)
                else:
                    stack.append((l,True)) # store return call
                    rsub,ssub = leftcomb(i)
                    stack.extend(ssub) #recurse
                    l = rsub
            return l
    

    But this goes only a tad faster (trampolined mergesort: 0.170638551712, this stack-based version:0.144994809628). Apparently, the stack-building python does at the recursive calls of our original merge sort is pretty inexpensive.

    The final results ? on my machine (Ubuntu natty's stock Python 2.7.1+), the average run timings (out of of 100 runs -except for Bubblesort-, list of size 10000, containing random integers of size 0-10000000) are:

    • Python's native (Tim)sort : 0.0144600081444
    • Bubblesort : 26.9620819092
    • Original Mergesort : 0.224888720512
    • no-len Mergesort : 0.195795390606
    • no-len Mergesort + fastmerge : 0.126505711079
    • trampolined CPS Mergesort + fastmerge : 0.170638551712
    • stack-based mergesort + fastmerge: 0.144994809628

提交回复
热议问题