Optimal median of medians selection - 3 element blocks vs 5 element blocks?

后端 未结 3 1402
[愿得一人]
[愿得一人] 2020-12-14 23:45

I\'m working on a quicksort-variant implementation based on the Select algorithm for choosing a good pivot element. Conventional wisdom seems to be to divide the array into

相关标签:
3条回答
  • 2020-12-15 00:04

    I believe it has to do with assuring a "good" split. Dividing into 5-element blocks assures a worst-case split of 70-30. The standard argument goes like this: of the n/5 blocks, at least half of the medians are >= the median-of-medians, hence at least half of the n/5 blocks have at least 3 elements (1/2 of 5) >= median-of-medians, and this gives a 3n/10 split, which means the other partition is 7n/10 in the worst case.

    That gives T(n) = T(n/5) + T(7n/10) + O(n).

    Since n/5 + 7n/10 < 1, the worst-case running time is O(n).

    Choosing 3-element blocks makes it thus: at least half of the n/3 blocks have at least 2 elements >= median-of-medians, hence this gives a n/3 split, or 2n/3 in the worst case.

    That gives T(n) = T(n/3) + T(2n/3) + O(n).

    In this case, n/3 + 2n/3 = 1, so it reduces to O(n log n) in the worst case.

    0 讨论(0)
  • 2020-12-15 00:09

    You can use blocks of size 3! Yes, I'm as surprised as you are. In 2014 (you asked in 2010) there came a paper which shows how to do so.

    The idea is as follows: instead of doing median3, partition, median3, partition, ..., you do median3, median3, partition, median3, median3, partition, ... . In the paper this is called "The Repeated Step Algorithm".

    So instead of:

    T(n) <= T(n/3) + T(2n/3) + O(n)
    T(n) = O(nlogn)
    

    one gets:

    T(n) <= T(n/9) + T(7n/9) + O(n)
    T(n) = Theta(n)
    

    The said article is Select with Groups of 3 or 4 Takes Linear Time by K. Chen and A. Dumitrescu (2014, arxiv), or Select with groups of 3 or 4 (2015, author's homepage).

    PS: The Fast Deterministic Selection by A. Alexandrescu (of D language fame!) which shows how to implement the above even more efficiently.

    0 讨论(0)
  • 2020-12-15 00:13

    The reason is that by choosing blocks of 3, we might lose the guarantee of having an O(n) time algorithm.

    For blocks of 5, the time complexity is

    T(n) = T(n/5) + T(7n/10) + O(n)

    For blocks of 3, it comes out to be

    T(n) = T(n/3) + T(2n/3) + O(n)

    Check this out: http://www.cs.berkeley.edu/~luca/w4231/fall99/slides/l3.pdf

    0 讨论(0)
提交回复
热议问题