How does random shuffling in quick sort help in increasing the efficiency of the code?

前端 未结 5 985
忘掉有多难
忘掉有多难 2020-12-28 20:00

I was going through lecture videos by Robert Sedgwick on algorithms, and he explains that random shuffling ensures we don\'t get to encounter the worst case quadratic time s

相关标签:
5条回答
  • 2020-12-28 20:04

    The assumption is that the worst case -- everything already sorted -- is frequent enough to be worth worrying about, and a shuffle is a black-magic least-effort sloppy way to avoid that case without having to admit that by improving that case you're moving the problem to another one, which happened to get randomly shuffled into sorted order. Hopefully that bad case is a much rarer situation, and even if it does come up the randomness means the problem can't easily be reproduced and blamed on this cheat.

    The concept of improving a common case at the expense of a rare one is fine. The randomness as an alternative to actually thinking about which cases will be more or less common is somewhat sloppy.

    0 讨论(0)
  • 2020-12-28 20:07

    What does a random shuffle do to the distribution on the input space? To understand this, let's look at a probability distribution, P, defined over a set S, where P is not in our control. Let us create a probability distribution P' by applying a random shuffle, over S to P. In other words, every time we get a sample from P, we map it, uniformly at random to an element of S. What can you say about this resulting distribution P'?

    P'(x) = summation over all elements s in S of P(s)*1/|S| = 1/|S|
    

    Thus, P' is just the uniform distribution over S. A random shuffle gives us control over the input probability distribution.

    How is this relevant to quicksort? Well, we know the average complexity of quicksort. This is computed wrt the uniform probability distribution and that is a property we want to maintain on our input distribution, irrespective of what it really is. To achieve that, we do a random shuffle of our input array, ensuring that the distribution is not adversarial in any way.

    0 讨论(0)
  • 2020-12-28 20:10

    It's really an admission that although we often talk about average case complexity, we don't in practice expect every case to turn up with the same probability.

    Sorting an already sorted array is worst case in quicksort, because whenever you pick a pivot, you discover that all the elements get placed on the same side of the pivot, so you don't split into two roughly equal halves at all. And often in practice this already sorted case will turn up more often than other cases.

    Randomly shuffling the data first is a quick way of ensuring that you really do end up with all cases turning up with equal probability, and therefore that this worst case will be as rare as any other case.

    It's worth noting that there are other strategies that deal well with already sorted data, such as choosing the middle element as the pivot.

    0 讨论(0)
  • 2020-12-28 20:16

    In case of randomized QuickSort, since the pivot element is randomly chosen, we can expect the split of the input array to be reasonably well balanced on average - as opposed to the case of 1 and (n-1) split in a non randomized version of the algorithm. This helps in preventing the worst-case behavior of QuickSort which occurs in unbalanced partitioning.

    Hence, the average case running time of the randomized version of QuickSort is O(nlogn) and not O(n^2);

    0 讨论(0)
  • 2020-12-28 20:19

    Is the video in coursera? Unfortunately, shuffle decrease performance to O(N^2) with data n,n,...,n,1,1,...,1. I have inspected Quick.java with nn11.awk that generate such data.

    $ for N in 10000 20000 30000 40000; do time ./nn11.awk $N | java Quick; done | awk 'NF>1'
    
    real    0m10.732s
    user    0m10.295s
    sys     0m0.948s
    
    real    0m48.057s
    user    0m44.968s
    sys     0m3.193s
    
    real    1m52.109s
    user    1m48.158s
    sys     0m3.634s
    
    real    3m38.336s
    user    3m31.475s
    sys     0m6.253s
    
    0 讨论(0)
提交回复
热议问题