Why does this quick sort cause stack overflow on nearly sorted lists and sorted lists?

浪子不回头ぞ 提交于 2019-12-01 17:32:48
Dukeling

For a random array, you could partition off massive chunks of the data.
But for a (nearly) sorted array, you'd mostly be partitioning off 1 element at a time.

So, for a sorted array, your stack size would end up being the same as the size of the array, while, for a random array, it's much more likely to be about a logarithm of that size.

So, even if the random array is much larger than a nearly sorted one, it's not surprising that the smaller one throws an exception, but the larger one doesn't.

Modifying your code

In terms of a fix, as EJP pointed out, you should do the smaller partition first to limit stack growth. But this in itself won't fix the problem as Java doesn't support tail-call optimization (well, it's optional for an implementation, as I understand that question).

A fairly simple fix here is to throw your function into a while-loop, essentially hard-coding the tail-call optimization.

To give a better idea of what I mean:

public static void quickSort(int[] arr, int highE, int lowE)
{
    while (true)
    {
        if (lowE + 29 < highE)
        {
            ...
            quickSort(arr, storeE - 1, lowE);

            // not doing this any more
            //quickSort(arr, highE, storeE + 1);

            // instead, simply set the parameters to their new values
            // highE = highE;
            lowE = storeE + 1;
        }
        else
        {
            insertSort(arr, highE, lowE);
            return;
        }
    }
}

Well, now that you have the basic idea, this would look better (functionally equivalent to the above, just more concise):

public static void quickSort(int[] arr, int highE, int lowE)
{
    while (lowE + 29 < highE)
    {
        ...
        quickSort(arr, storeE - 1, lowE);
        lowE = storeE + 1;
    }
    insertSort(arr, highE, lowE);
}

This of course doesn't actually do the smaller one first, but I'll leave that to you to figure out (seems you already have a fair idea of how to do this).

How this works

For some made up values...

Your current code does this: (an indent indicates what happens inside that function call - thus increasing indentation means recursion)

quickSort(arr, 100, 0)
   quickSort(arr, 49, 0)
      quickSort(arr, 24, 0)
         insertion sort
      quickSort(arr, 49, 26)
         insertion sort
   quickSort(arr, 100, 51)
      quickSort(arr, 76, 0)
         insertion sort
      quickSort(arr, 100, 74)
         insertion sort

The modified code does this:

quickSort(arr, 100, 0)
   quickSort(arr, 49, 0)
      quickSort(arr, 24, 0)
         break out of the while loop
         insertion sort
   lowE = 26
   break out of the while loop
      insertion sort
lowE = 51
run another iteration of the while-loop
    quickSort(arr, 76, 0)
      break out of the while loop
      insertion sort
lowE = 74
break out of the while loop
   insertion sort

Increase the stack size

Not sure whether you've considered this, or whether it would work with your parameters, but you can always consider simply increasing the stack size with the -Xss command-line parameter.

Don Knuth in [ACP][1] suggests always pushing the larger of the two partitions and sorting the smaller one immediately, to limit stack growth. In your code that corresponds to recursively sorting the smaller of the two partitions first, then the other one.

[1]: The Art of Computer Programming, vol III, #5.2.2 p.114.

StackOverflowError is most likely related to too deep recursion. With more elements to sort your quicksort must do more recursive calls to quicksort() before entering the insertion sort part. At some point this recursion is too deep and there are too many method calls on the stack.

It might be that recursion on already sorted lists lead to deeper recursion and therefore crashing earlier with less elements than sorting an unsorted list. This depends on implementation.

For non-academic and non-learning purposes it is always preferable to implement these algs with imperative style instead of using recursion.

Check if you have long runs of identical elements. The partition part:

for (int i = lowE; i < highE; i++)
{
    if (arr[i] < pivotVal)
    {
        swapElements(arr, storeE, i);
        storeE++;
    }
}

partitions a list containing the same elements in the worst possible way.

With sorted or nearly sorted data set, QuickSort exibits worst case running time of O(n^2). With large value of N, recursion tree goes so deep such that system stack exhaust to spawn further recursions. Generally such algorithms should be implemented with iterative approach instead of recursive approach.

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!