most significant v.s. least significant radix sort

筅森魡賤 提交于 2019-11-29 02:39:53

A LSD radix sort can logically concatenate the sorted bins after each pass (consider them to be a single bin if using a counting / radix sort). A MSD radix sort has to recursively sort each bin independently after each pass. If sorting by bytes, that 256 bins after first pass, 65536 bins after second pass, 16777216 (16 million) bins after third pass, ... .

This is why the old card sorters sort data LSD first. Link to video of one of these in action. The cards are fed in and drop into the chutes face down. In the video, the card sorter drops the cards into bins "0" to "9", then the operator takes the cards from the 0 bin, then takes the cards from the 1 bin and places them on top (behind) the 0 bin cards, then the 2 bin cards go behind the deck, and so on, "concatenating" the cards from the bins. For large decks of cards, above the card sorter would be set of shelves above each bin to place the cards when the decks were too large to hold by hand.

http://www.youtube.com/watch?v=jJH2alRcx4M

Example C++ LSD radix sort for 32 bit unsigned integers, where each "digit" is a byte. Most of the code generates a matrix of counts which are converted into indices that mark the boundaries between variable size bins. The actual radix sort is in the last nested loop.

//  a is input array, b is working array
uint32_t * RadixSort(uint32_t * a, uint32_t *b, size_t count)
{
size_t mIndex[4][256] = {0};            // count / index matrix
size_t i,j,m,n;
uint32_t u;
    for(i = 0; i < count; i++){         // generate histograms
        u = a[i];
        for(j = 0; j < 4; j++){
            mIndex[j][(size_t)(u & 0xff)]++;
            u >>= 8;
        }       
    }
    for(j = 0; j < 4; j++){             // convert to indices
        m = 0;
        for(i = 0; i < 256; i++){
            n = mIndex[j][i];
            mIndex[j][i] = m;
            m += n;
        }       
    }
    for(j = 0; j < 4; j++){             // radix sort
        for(i = 0; i < count; i++){     //  sort by current lsb
            u = a[i];
            m = (size_t)(u>>(j<<3))&0xff;
            b[mIndex[j][m]++] = u;
        }
        std::swap(a, b);                //  swap ptrs
    }
    return(a);
}

The part that's confusing you is that pretty much ALL LSD radix sorts preserve the order of duplicate keys. That's because they rely on this property to work at all. For example, if you have 2 iterations like this, sorting by first the ones place and then the tens place:

22        21        11
21   ->   11   ->   21
11        22        22

When we sort by tens we need to preserve the tie-breaking order we got when we sorted by ones, so that 21 and 22 come out in the proper order even though they have the same digits in the 10s place. If you implement the first sort (by ones) the same way you have to do all the other ones (and why wouldn't you?), then the sort is stable.

An MSD radix sort can be written using the same kinds of sorting steps as an LSD radix sort, in which case it will be stable, too. But there are other, often more efficient ways to implement an MSD radix sort that don't have this property.

MSD-first radix sorts that don't preserve the order or duplicates are usually in-place, i.e., they work without allocating a separate array to hold the sorted elements.

NOTE that none of this makes any difference if you're just sorting a list of strings by comparing their ASCII code points. "preserving the order of duplicate keys" only matters when they have extra information attached to them. For example if the keys have associated values, or if you are sorting in a case-independent manner and you want "Abe" and "abE" out in the same order they came in.

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!