问题
Why is QuickSort bad at sorting almost sorted data? In comparison, why is insertion sort better? Trying to understand Big O notation!
回答1:
Your statement is true for certain variants of QS depending on the choice of pivot. QS performance depends on the pivoting operation to divide the data into approximately equally sized chunks, which will then be sorted separately. If the pivot is the min or max of the data, or represents a high or low percentile, the pivoting operation will divide the data into two parts whereby most of the data is in one of the two, which still needs to be sorted. If the first element of the data is chosen as a pivot, and the data is sorted, this worst case scenario occurs. By just choosing a random element as pivot, the worst case scenario has a negligible chance of occurring. This is irrelevant to worst case analysis, but on average (over possible pivots, worst case wrt input) or in practice this results in good performance.
回答2:
Quicksort's algorithm is as follows:
- Select a "pivot" value from the elements in the list.
- Reorder the list so that all values are in their correct position relative to the pivot (e.g. if we want to sort the list in ascending order then all values less than the pivot would go before the pivot, and all values greater than the pivot would go after the pivot).
- Quicksort the sections of the list before and after the pivot.
Whether the assertion that it performs poorly with sorted/nearly-sorted lists is even true depends entirely upon how step 1 is performed. What is the pivot? Say I'm trying to sort the following list into ascending order:
1, 2, 3, 4, 5, 6
Well, let's consider step 1. Which do I use as a pivot? If we designed our code under the assumption that the list order is random, we'd probably just use the first element, as any pivot is equally likely to be good when the order is completely random. In this case, however, the two sub-lists that need to be sorted are extremely uneven. Specifically, the first is empty, and the second is all remaining values
2, 3, 4, 5, 6
When we sort it, we will use 2
as the pivot and find the exact same thing happens again. This ultimately means that each value is compared to each other value. If we had selected 3 as the pivot instead, however, we would then have our remaining values split into 1, 2
and 4, 5, 6
. As a result, 1
would be compared to 2
, but neither would ever need to be compared to any of the values in 4, 5, 6
. Let's consider how 4, 5, 6
would then be sorted. If 4
were selected as the pivot, 4
would be compared to 5
and 6
, and then 5
would need to be compared to 6
in the next iteration. Conversely, were 5
our pivot, 5
would be compared to 4
and 6
, but 4
and 6
would never be compared to each-other.
Note that this problem is the same for cases where the list is in perfectly reversed order as well.
Of course, a solution could be to use a different technique for choosing a pivot.
In terms of big O notation, Insertion-sort has a O(n^2), and Quicksort has a worst-case O(n^2), but a best-case O(nlog(n)). Insertion-sort is almost never preferable to Quicksort.
Addendum: Insertion-sort works well for a pre-sorted list because it works by iteratively comparing elements to their adjacent element to see if they should be swapped with one-another. In a pre-sorted list there would be no swapping, and as such no need for more than 1 comparison per element, and as such could be considered a O(n).
来源:https://stackoverflow.com/questions/53734024/why-quicksort-bad-at-sorting-almost-sorted-data