When analyzing QS, every one always refers to the \"almost sorted\" worst case. When can such a scenario occur with natural input?
The only example I came up with is
The actual question was: "When can such a scenario (almost sorted) occur with natural input?".
Although all the answers are dealing with "what causes worst case performance", none have covered "what causes data that meets the worst case performance scenario".
Programmer error: Basically you land up sorting a list twice. Typically this happens because a list is sorted one place in code. And later in another piece of code you know you need the list to be sorted, so you sort it again.
Using almost-chronological data: You have data that is generally received in chronological order, but occasionally some elements are out of position. (Consider a multi-threaded environment adding time-stamped elements to a list. Race conditions can cause elements to be added in a different order to which they were time-stamped.) In this situation, if you need sorted data, you must re-sort. Because the order of the data is not guaranteed.
Adding items to a list: If you have a sorted list and simply append some items (i.e. without using binary insertion). You would need to re-sort an almost-sorted list.
Data from an external source: If you receive data from an external source, there may be no guarantee that it's sorted. So you sort it yourself. However, if the external source is sorted, you will be re-sorting the data.
Natural ordering: This is similar to the chronoloigcal data. Basically, the natural order of the data you receive may be sorted. Consider an insurance company adding car registrations. If the authority assiging car registrations does so in a predictable order, newer cars are likely but not guaranteed to have higher registration numbers. Since you're not guaranteed it's sorted - you have to re-sort.
Interleaved data: If you receive data from multiple sorted sources with overlapping keys, you could get keys resembling the following: 1 3 2 5 4 7 6 9 8 11 10 13 12 15 14 17 16 19 18. Even though half the elements are out-of-sequence with its neighbour, the list is "almost sorted". Certainly using QuickSort that pivots on the first element would exhibit O(n^2)
performance.
So, given all the above scenarios, it's actually quite easy to land up sorting almost-sorted data. And this is exactly why QuickSort that pivots on the first element is actually best avoided. polygene has provided some interesting information on alternate pivoting considerations.
As a side-note: One of the usually worst performing sorting algorithms, actually does quite well with "almost-sorted" data. In the interleaved data above, bubble-sort requires only 9 swap operations. It's performance would actually be
O(n)
.