Quicksort vs heapsort

假装没事ソ 提交于 2019-12-17 05:38:22

问题


Both quicksort and heapsort do in-place sorting. Which is better? What are the applications and cases in which either is preferred?


回答1:


This paper has some analysis.

Also, from Wikipedia:

The most direct competitor of quicksort is heapsort. Heapsort is typically somewhat slower than quicksort, but the worst-case running time is always Θ(nlogn). Quicksort is usually faster, though there remains the chance of worst case performance except in the introsort variant, which switches to heapsort when a bad case is detected. If it is known in advance that heapsort is going to be necessary, using it directly will be faster than waiting for introsort to switch to it.




回答2:


Heapsort is O(N log N) guaranted, what is much better than worst case in Quicksort. Heapsort doesn't need more memory for another array to putting ordered data as is needed by Mergesort. So why do comercial applications stick with Quicksort? What Quicksort has that is so special over others implementations?

I've tested the algorithms myself and I've seen that Quicksort has something special indeed. It runs fast, much faster than Heap and Merge algorithms.

The secret of Quicksort is: It almost doesn't do unnecessary element swaps. Swap is time consuming.

With Heapsort, even if all of your data is already ordered, you are going to swap 100% of elements to order the array.

With Mergesort, it's even worse. You are going to write 100% of elements in another array and write it back in the original one, even if data is already ordered.

With Quicksort you don't swap what is already ordered. If your data is completely ordered, you swap almost nothing! Although there is a lot of fussing about worst case, a little improvement on the choice of pivot, any other than getting the first or last element of array, can avoid it. If you get a pivot from the intermediate element between first, last and middle element, it is suficient to avoid worst case.

What is superior in Quicksort is not the worst case, but the best case! In best case you do the same number of comparisons, ok, but you swap almost nothing. In average case you swap part of the elements, but not all elements, as in Heapsort and Mergesort. That is what gives Quicksort the best time. Less swap, more speed.

The implementation below in C# on my computer, running on release mode, beats Array.Sort by 3 seconds with middle pivot and by 2 seconds with improved pivot (yes, there is an overhead to get a good pivot).

static void Main(string[] args)
{
    int[] arrToSort = new int[100000000];
    var r = new Random();
    for (int i = 0; i < arrToSort.Length; i++) arrToSort[i] = r.Next(1, arrToSort.Length);

    Console.WriteLine("Press q to quick sort, s to Array.Sort");
    while (true)
    {
        var k = Console.ReadKey(true);
        if (k.KeyChar == 'q')
        {
            // quick sort
            Console.WriteLine("Beg quick sort at " + DateTime.Now.ToString("HH:mm:ss.ffffff"));
            QuickSort(arrToSort, 0, arrToSort.Length - 1);
            Console.WriteLine("End quick sort at " + DateTime.Now.ToString("HH:mm:ss.ffffff"));
            for (int i = 0; i < arrToSort.Length; i++) arrToSort[i] = r.Next(1, arrToSort.Length);
        }
        else if (k.KeyChar == 's')
        {
            Console.WriteLine("Beg Array.Sort at " + DateTime.Now.ToString("HH:mm:ss.ffffff"));
            Array.Sort(arrToSort);
            Console.WriteLine("End Array.Sort at " + DateTime.Now.ToString("HH:mm:ss.ffffff"));
            for (int i = 0; i < arrToSort.Length; i++) arrToSort[i] = r.Next(1, arrToSort.Length);
        }
    }
}

static public void QuickSort(int[] arr, int left, int right)
{
    int begin = left
        , end = right
        , pivot
        // get middle element pivot
        //= arr[(left + right) / 2]
        ;

    //improved pivot
    int middle = (left + right) / 2;
    int
        LM = arr[left].CompareTo(arr[middle])
        , MR = arr[middle].CompareTo(arr[right])
        , LR = arr[left].CompareTo(arr[right])
        ;
    if (-1 * LM == LR)
        pivot = arr[left];
    else
        if (MR == -1 * LR)
            pivot = arr[right];
        else
            pivot = arr[middle];
    do
    {
        while (arr[left] < pivot) left++;
        while (arr[right] > pivot) right--;

        if(left <= right)
        {
            int temp = arr[right];
            arr[right] = arr[left];
            arr[left] = temp;

            left++;
            right--;
        }
    } while (left <= right);

    if (left < end) QuickSort(arr, left, end);
    if (begin < right) QuickSort(arr, begin, right);
}



回答3:


For most situations, having quick vs. a little quicker is irrelevant... you simply never want it to occasionally get waayyy slow. Although you can tweak QuickSort to avoid the way slow situations, you lose the elegance of the basic QuickSort. So, for most things, I actually prefer HeapSort... you can implement it in its full simple elegance, and never get a slow sort.

For situations where you DO want max speed in most cases, QuickSort may be preferred over HeapSort, but neither may be the right answer. For speed-critical situations, it is worth examining closely the details of the situation. For example, in some of my speed-critical code, it is very common that the data is already sorted or near-sorted (it is indexing multiple related fields that often either move up and down together OR move up and down opposite each other, so once you sort by one, the others are either sorted or reverse-sorted or close... either of which can kill QuickSort). For that case, I implemented neither... instead, I implemented Dijkstra's SmoothSort... a HeapSort variant that is O(N) when already sorted or near-sorted... it is not so elegant, not too easy to understand, but fast... read http://www.cs.utexas.edu/users/EWD/ewd07xx/EWD796a.PDF if you want something a bit more challenging to code.




回答4:


Quicksort-Heapsort in-place hybrids are really interesting, too, since most of them only needs n*log n comparisons in the worst case (they are optimal with respect to the first term of the asymptotics, so they avoid the worst-case scenarios of Quicksort), O(log n) extra-space and they preserve at least "a half" of the good behaviour of Quicksort with respect to already-ordered set of data. An extremely interesting algorithm is presented by Dikert and Weiss in http://arxiv.org/pdf/1209.4214v1.pdf:

  • Select a pivot p as the median of a random sample of sqrt(n) elements (this can be done in at most 24 sqrt(n) comparisons through the algorithm of Tarjan&co, or 5 sqrt(n) comparisons through the much more convoluted spider-factory algorithm of Schonhage);
  • Partition your array in two parts as in the first step of Quicksort;
  • Heapify the smallest part and use O(log n) extra bits to encode a heap in which every left child has a value greater than its sibling;
  • Recursively extract the root of the heap, sift down the lacune left by the root until it reaches a leaf of the heap, then fill the lacune with an appropriate element took from the other part of the array;
  • Recur over the remaining non-ordered part of the array (if p is chosen as the exact median, there is no recursion at all).



回答5:


Comp. between quick sort and merge sort since both are type of in place sorting there is a difference between wrost case running time of the wrost case running time for quick sort is O(n^2) and for heap sort it is still O(n*log(n)) and for a average amount of data quick sort will be more useful. Since it is randomized algorithm so the probability of getting correct ans. in less time will depend on the position of pivot element you choose.

So a

Good call: the sizes of L and G are each less than 3s/4

Bad call: one of L and G has size greater than 3s/4

for small amount we can go for insertion sort and for very large amount of data go for heap sort.




回答6:


Well if you go to architecture level...we use queue data structure in cache memory.so what ever is available in queue will get sorted.As in quick sort we have no issue dividing the array into any lenght...but in heap sort(by using array) it may so happen that the parent may not be present in the sub array available in cache and then it has to bring it in cache memory ...which is time consuming. That's quicksort is best!!😀




回答7:


Heapsort builds a heap and then repeatedly extracts the maximum item. Its worst case is O(n log n).

But if you would see the worst case of quick sort, which is O(n2), you would realized that quick sort would be a not-so-good choice for large data.

So this makes sorting is an interesting thing; I believe the reason so many sorting algorithms live today is because all of them are 'best' at their best places. For instance, bubble sort can out perform quick sort if the data is sorted. Or if we know something about the items to be sorted then probably we can do better.

This may not answer your question directly, thought I'd add my two cents.




回答8:


Heapsort has the benefit of having a worst running case of O(n*log(n)) so in cases where quicksort is likely to be performing poorly (mostly sorted data sets generally) heapsort is much preferred.




回答9:


Heap Sort is a safe bet when dealing with very large inputs. Asymptotic analysis reveals order of growth of Heapsort in the worst case is Big-O(n logn), which is better than Quicksort's Big-O(n^2) as a worst case. However, Heapsort is somewhat slower in practice on most machines than a well-implemented quick sort. Heapsort is also not a stable sorting algorithm.

The reason heapsort is slower in practice than quicksort is due to the better locality of reference ("https://en.wikipedia.org/wiki/Locality_of_reference") in quicksort, where data elements are within relatively close storage locations. Systems that exhibit strong locality of reference are great candidates for performance optimization. Heap sort, however, deals with larger leaps. This makes quicksort more favorable for smaller inputs.




回答10:


To me there is a very fundamental difference between heapsort and quicksort: the latter uses a recursion. In recursive algorithms the heap grows with the number of recursions. This does not matter if n is small, but right now I am sorting two matrices with n=10^9 !!. The program takes almost 10 GB of ram and any extra memory will make my computer to start swapping to virtual disk memory. My disk is a RAM disk, but still swapping to it make a huge difference in speed. So in a statpack coded in C++ that includes adjustable dimension matrices, with size unknown in advance to the programmer, and nonparametric statistical kind of sorting I prefer the heapsort to avoid delays to uses with very big data matrices.




回答11:


To answer the original question and address some of the other comments here:

I just compared implementations of selection, quick, merge, and heap sort to see how they'd stack up against each other. The answer is that they all have their downsides.

TL;DR: Quick is the best general purpose sort (reasonably fast, stable, and mostly in-place) Personally I prefer heap sort though unless I need a stable sort.

Selection - N^2 - It's really only good for less than 20 elements or so, then it's outperformed. Unless your data is already sorted, or very, very nearly so. N^2 gets really slow really fast.

Quick, in my experience, is not actually that quick all the time. Bonuses for using quick sort as a general sort though are that it's reasonably fast and it's stable. It's also an in-place algorithm, but as it's generally implemented recursively, it will take up additional stack space. It also falls somewhere between O(n log n) and O(n^2). Timing on some sorts seem to confirm this, especially when the values fall within a tight range. It's way faster than selection sort on 10,000,000 items, but slower than merge or heap.

Merge sort is guaranteed O(n log n) since its sort is not data dependent. It just does what it does, regardless of what values you've given it. It's also stable, but very large sorts can blow out your stack if you're not careful about implementation. There are some complex in-place merge sort implementations, but generally you need another array in each level to merge your values into. If those arrays live on the stack you can run into issues.

Heap sort is max O(n log n), but in many cases is quicker, depending on how far you have to move your values up the log n deep heap. The heap can easily be implemented in-place in the original array, so it needs no additional memory, and it's iterative, so no worries about stack overflow while recursing. The huge downside to heap sort is that it is not a stable sort, which means it's right out if you need that.



来源:https://stackoverflow.com/questions/2467751/quicksort-vs-heapsort

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!