Worst case of the Quicksort algorithm

问题

I found many implementations of quick sort algorithm, but at the end I decided to stick to this one:

public static void quickSort(int array[], int start, int end)
        {
            if(end <= start || start >= end) { 

            } else {
            int pivot = array[start];
            int temp = 0 ;
            int i = start+1;

            for(int j = 1; j <= end; j++)  { 
                if(pivot > array[j]) { 
                    temp = array[j];
                    array[j] = array[i];
                    array[i] = temp;
                    i++;
                }

            }
            array[start] = array[i-1];
            array[i-1] = pivot;
            quickSort(array, start, i-2);
            quickSort(array, i, end);
        }}

There are several things I'm confused about.
Why some people suggest taking the first element as a pivot point, others tell to pick the middle element and some will tell that you should pick the last element as your pivot point, wouldn't it be different?
Let's say I'm trying to show why if the array is sorted quick sort will have O(n^2) as the worst case order of growth.
I have the following array:
{1, 2, 3, 4, 5, 6}.
If I pick the first element as my pivot element, would it not compare it to every other element and then will just swap it with itself and will be just O(n)? Then it will proceed further to two lines which are O(logn)

quickSort(array, start, i-2);
quickSort(array, i, end);

So at the end, even if it is an ordered list of integers, it will still be O(nlogn)?

If I decided to pick my last element as my pivot element, would it not be completely different? It will be swapping 6 and 1 and hence it will be doing the operations that are completely different compared to when the pivot element was the first element in the array.

I just don't understand why the worst case is O(n^2).

Any help will be greatly appreciated!

回答1:

The whole point of Quicksort is to find a pivot that partitions the array into two approximately equal pieces. That's where you get the log(n) from.

Suppose there is an array of size n and at each iteration you can partition the array into equal parts. Then we have:

T(n) = 2 * T(n / 2) + O(n)
     = 4 * T(n/4) + 2 * O(n)
.
.
(log(n) steps)
.
.
    = 2^log(n) * T(1) + log(n) * O(n)
    = n * O(1) + O(n * log(n))
    = O(n * log(n))

Now, if we partition the array into sizes say 1 and n-1, we get:

T(n) = T(1) + T(n-1) + O(n) = T(n-1) + O(n)
     = T(n-2) + O(n-1) + O(n)
     = T(n-3) + O(n-2) + O(n-1) + O(n)
.
.
(n-1) steps
.
.
    = T(1) + O(2) + O(3) + ... + O(n)
    = O(1 + 2 + 3 + .... + n)
    = O(n^2)

In the case that you mention, both of the following will not individually be O(log(n)). One will be O(1) and the other will be T(n-1) if the array is sorted. Hence you would get the O(n^2) complexity.

quickSort(array, start, i-2); // should be constant time
quickSort(array, i, end); // should be T(n-1)

And as @MarkRansom mentions below, this is not exclusive to sorted arrays. In general, if you choose pivots in such a way that the array is very unevenly partitioned, you'll run into such worst-case complexities. For example, if the array is not sorted but you always choose the maximum (or minimum depending upon your implementation) for the pivot, you'll run into the same problem.

回答2:

QuickSort starts by moving everything that's got a higher value than the pivot value to the end of the list, and everything that's got a lower value to the beginning of the list.

If the value at your pivot point is the lowest value in the list, then every value in the list will be moved to the end of the list. However, just determining where to move all of those values requires O(n) work. If you then pick the second-lowest value, and then the third-lowest value, etc., then you'll end up doing O(n) work n/2 times. O(n²/2) simplifies to O(n²).

Why some people suggest taking the first element as a pivot point, others tell to pick the middle element and some will tell that you should pick the last element as your pivot point, wouldn't it be different?

It's all a matter of trying to guess (without scanning the whole list) which element is most likely to be close to the median of your data set, thereby giving you as close to the best-case behavior as possible.

If your data is totally random, then it doesn't matter what you choose--you're equally likely to get a good pivot point, and your chances of consistently choosing the worst pivot point are very slim. Choosing the first or last value is the simplest option that works.
If your data is presorted (or mostly so), choosing the middle is probably going to get you one of the best values, whereas choosing the first or last element will consistently give you the worst pivot points.

In real life, the likelihood of dealing with data that's mostly presorted is high enough that it's probably worth the slightly higher complexity of code. The Wikipedia section on this topic may be worth reading.

回答3:

Below is a quicksort that uses median of 3, and limits stack complexity to O(log(n)) by only using recursion on the smaller part, then looping back for the larger part. Worst case time complexity is still O(n^2), but this would require median of 3 to repeatedly choose small or large values. Time complexity can be limited to O(n log(n)) by using median of medians, but the overhead for this makes the average case much slower (I'm wondering if it ends up slower than heap sort. With median of medians, it's definitely slower than merge sort, but a standard merge sort needs a second array the same size or 1/2 the size of the original array).

http://en.wikipedia.org/wiki/Median_of_medians

Introsort solves the worst case time complexity by switching to heap sort based on the level of recursion.

http://en.wikipedia.org/wiki/Introsort

typedef unsigned int uint32_t;

void QuickSort(uint32_t a[], size_t lo, size_t hi) {
    while(lo < hi){
        size_t i = lo, j = (lo+hi)/2, k = hi;
        uint32_t p;
        if (a[k] < a[i])            // median of 3
            std::swap(a[k], a[i]);
        if (a[j] < a[i])
            std::swap(a[j], a[i]);
        if (a[k] < a[j])
            std::swap(a[k], a[j]);
        p = a[j];
        i--;                        // Hoare partition
        k++;
        while (1) {
            while (a[++i] < p);
            while (a[--k] > p);
            if (i >= k)
                break;
            std::swap(a[i], a[k]);
        }
        i = k++;
        // recurse on smaller part, loop on larger part
        if((i - lo) <= (hi - k)){
            QuickSort(a, lo, i);
            lo = k;
        } else {
            QuickSort(a, k, hi);
            hi = i;
        }
    }
}

来源：https://stackoverflow.com/questions/40476477/worst-case-of-the-quicksort-algorithm

标签

arrays

algorithm

sorting

time-complexity

quicksort