Why does Java's Arrays.sort method use two different sorting algorithms for different types?

Java 6's Arrays.sort method uses Quicksort for arrays of primitives and merge sort for arrays of objects. I believe that most of time Quicksort is faster than merge sort and costs less memory. My experiments support that, although both algorithms are O(n log(n)). So why are different algorithms used for different types?

Michael Borgwardt

The most likely reason: quicksort is not stable, i.e. equal entries can change their relative position during the sort; among other things, this means that if you sort an already sorted array, it may not stay unchanged.

Since primitive types have no identity (there is no way to distinguish two ints with the same value), this does not matter for them. But for reference types, it could cause problems for some applications. Therefore, a stable merge sort is used for those.

OTOH, a reason not to use the (guaranteed n*log(n)) stable merge sort for primitive types might be that it requires making a clone of the array. For reference types, where the referred objects usually take up far more memory than the array of references, this generally does not matter. But for primitive types, cloning the array outright doubles the memory usage.

Will Byrne

According to Java 7 API docs cited in this answer, Arrays#Sort() for object arrays now uses TimSort, which is a hybrid of MergeSort and InsertionSort. On the other hand, Arrays#sort() for primitive arrays now uses Dual-Pivot QuickSort. These changes were implemented starting in Java SE 7.

One reason I can think of is that quicksort has a worst case time complexity of O(n^2) while mergesort retains worst case time of O(n log n). For object arrays there is a fair expectation that there will be multiple duplicate object references which is one case where quicksort does worst.

There is a decent visual comparison of various algorithms, pay particular attention to the right-most graph for different algorithms.

I was taking Coursera class on Algorithms and in one of the lectures Professor Bob Sedgewick mentioning the assessment for Java system sort:

"If a programmer is using objects, maybe space is not a critically important consideration and the extra space used by a merge sort maybe not a problem. And if a programmer is using primitive types, maybe the performance is the most important thing so they use quick sort."

Java's Arrays.sort method uses quicksort, insertion sort and mergesort. There is even both a single and dual pivot quicksort implemented in the OpenJDK code. The fastest sorting algorithm depends on the circumstances and the winners are: insertion sort for small arrays (47 currently chosen), mergesort for mostly sorted arrays, and quicksort for the remaining arrays so Java's Array.sort() tries to choose the best algorithm to apply based on those criteria.

java.util.Arrays uses quicksort for primitive types such as int and mergesort for objects that implement Comparable or use a Comparator. The idea of using two different methods is that if a programmer’s using objects maybe space is not a critically important consideration and so the extra space used by mergesort maybe’s not a problem and if the programmer’s using primitive types maybe performance is the most important thing so use the quicksort.

For Example: This is the example when sorting stability matters.

That’s why stable sorts make sense for object types, especially mutable object types and object types with more data than just the sort key, and mergesort is such a sort. But for primitive types stability is not only irrelevant. It’s meaningless.

Source: INFO

来源：https://stackoverflow.com/questions/3707190/why-does-javas-arrays-sort-method-use-two-different-sorting-algorithms-for-diff

标签

java

algorithm

quicksort

mergesort