问题
I have been picking and probing at Swift standard libraries sort()
function for its Array
type. To my surprise I have noticed it performs poorly on already-sorted data.
Sorting an array of Int
which is shuffled seems to be 5x faster than sorting that very same array when it is already sorted. Sorting an array of shuffled objects is about 4x faster than sorting the very same one already in sorted order (sorting object array vs Int
array use different algorithms I am sure so I sorted both to eliminate bias).
These are the results:
Shuffled Int array sort time: 1.3961209654808
Shuffled ColorObject array sort time: 3.14633798599243
NOnshuffled Int array sort time: 7.34714204072952
NOnshuffled ColorObject array sort time: 10.9310839772224
For reference below is my code:
class ElapsedTimer {
let startTime: CFAbsoluteTime
var endTime: CFAbsoluteTime?
init() {
startTime = CFAbsoluteTimeGetCurrent()
}
func stop() -> CFAbsoluteTime {
endTime = CFAbsoluteTimeGetCurrent()
return duration!
}
var duration: CFAbsoluteTime? {
if let endTime = endTime {
return endTime - startTime
} else {
return nil
}
}
}
public class CountedColor {
public private(set) var count: Int
public private(set) var color: UIColor
public init(color: UIColor, colorCount: Int) {
self.count = colorCount
self.color = color
}
}
var distributedIntArray = [Int]()
for value in 1..<1000000 {
distributedIntArray.append(value)
}
var distributedCountedColorArray = distributedIntArray.map{ CountedColor(color: UIColor.white, colorCount: $0) }
distributedCountedColorArray.shuffle()
distributedIntArray.shuffle()
var timer = ElapsedTimer()
distributedIntArray.sort()
print("Shuffled Int array sort time: \(timer.stop())")
timer = ElapsedTimer()
distributedCountedColorArray.sort{ return $0.count < $1.count }
print("Shuffled Color array sort time: \(timer.stop())")
timer = ElapsedTimer()
distributedIntArray.sort()
print("NOnshuffled Int array sort time: \(timer.stop())")
timer = ElapsedTimer()
distributedCountedColorArray.sort{ return $0.count < $1.count }
print("Non shuffled Color array sort time: \(timer.stop())")
My array shuffle()
method was pulled from this post. My ElapsedTimer
simply wraps and uses CACurrentMediaTime()
functions.
My question is why am I seeing this behavior? Especially when I am sorting the object array which should surely be using a general purpose sort. What kind of general purpose sorting algorithm is swift using? It surely can’t be one where the worst case and average case are the same like mergeSort.
回答1:
Swift uses Introsort. Looking at the source code we see that the chosen pivot is the first element. The wikipedia page on Introsort says:
(...), one of the critical operations is choosing the pivot: the element around which the list is partitioned. The simplest pivot selection algorithm is to take the first or the last element of the list as the pivot, causing poor behavior for the case of sorted or nearly sorted input.
Thus it is entirely predictable, given the implementation choice, that Swift's sorting performance is worst for sorted inputs.
I have built a complete benchmark for people who want to easily reproduce the OP's claims : https://github.com/lemire/Code-used-on-Daniel-Lemire-s-blog/tree/master/extra/swift/sort
For reference, the GNU ISO C++ standard library uses a median-of-3 pivot (as per the stl_algo.h
header).
回答2:
This is a duplicate of this question it seems: Swift sorting algorithm implementation
Furthermore the reason why it performs so much better when shuffled is probably just because when its shuffled its performance is not hitting the upper bound of NlogN. Sorting the sorted array probably gets closer to that upper limit so its still all in all the same sort. But I dont know this is just theory
来源:https://stackoverflow.com/questions/41031106/which-general-purpose-sorting-algorithm-does-swift-use-it-does-not-perform-well