How can I sort numbers lexicographically?

前端 未结 14 913
半阙折子戏
半阙折子戏 2020-12-09 04:56

Here is the scenario.

I am given an array \'A\' of integers. The size of the array is not fixed. The function that I am supposed to write may be called once with an

相关标签:
14条回答
  • 2020-12-09 05:36

    If you want to try a better preprocess-sort-postprocess, then note that an int is at most 10 decimal digits (ignoring signed-ness for the time being).

    So the binary-coded-decimal data for it fits in 64 bits. Map digit 0->1, 1->2 etc, and use 0 as a NUL terminator (to ensure that "1" comes out less than "10"). Shift each digit in turn, starting with the smallest, into the top of a long. Sort the longs, which will come out in lexicographical order for the original ints. Then convert back by shifting digits one at a time back out of the top of each long:

    uint64_t munge(uint32_t i) {
        uint64_t acc = 0;
        while (i > 0) {
            acc = acc >> 4;
            uint64_t digit = (i % 10) + 1;
            acc += (digit << 60);
            i /= 10;
        }
        return acc;
    }
    
    uint32_t demunge(uint64_t l) {
        uint32_t acc = 0;
        while (l > 0) {
            acc *= 10;
            uint32_t digit = (l >> 60) - 1;
            acc += digit;
            l << 4;
        }
    }
    

    Or something like that. Since Java doesn't have unsigned ints, you'd have to modify it a little. It uses a lot of working memory (twice the size of the input), but that's still less than your initial approach. It might be faster than converting to strings on the fly in the comparator, but it uses more peak memory. Depending on the GC, it might churn its way through less memory total, though, and require less collection.

    0 讨论(0)
  • 2020-12-09 05:36

    The question doesn't indicate how to treat negative integers in the lexicographic collating order. The string-based methods presented earlier typically will sort negative values to the front; eg, { -123, -345, 0, 234, 78 } would be left in that order. But if the minus signs were supposed to be ignored, the output order should be { 0, -123, 234, -345, 78 }. One could adapt a string-based method to produce that order by somewhat-cumbersome additional tests.

    It may be simpler, in both theory and code, to use a comparator that compares fractional parts of common logarithms of two integers. That is, it will compare the mantissas of base 10 logarithms of two numbers. A logarithm-based comparator will run faster or slower than a string-based comparator, depending on a CPU's floating-point performance specs and on quality of implementations.

    The java code shown at the end of this answer includes two logarithm-based comparators: alogCompare and slogCompare. The former ignores signs, so would produce { 0, -123, 234, -345, 78 } from { -123, -345, 0, 234, 78 }.

    The number-groups shown next are the output produced by the java program.

    The “dar rand” section shows a random-data array dar as generated. It reads across and then down, 5 elements per line. Note, arrays sar, lara, and lars initially are unsorted copies of dar.

    The “dar sort” section is dar after sorting via Arrays.sort(dar);.

    The “sar lex” section shows array sar after sorting with Arrays.sort(sar,lexCompare);, where lexCompare is similar to the Comparator shown in Jason Cohen's answer.

    The “lar s log” section shows array lars after sorting by Arrays.sort(lars,slogCompare);, illustrating a logarithm-based method that gives the same order as do lexCompare and other string-based methods.

    The “lar a log” section shows array lara after sorting by Arrays.sort(lara,alogCompare);, illustrating a logarithm-based method that ignores minus signs.

    dar rand    -335768    115776     -9576    185484     81528
    dar rand      79300         0      3128      4095    -69377
    dar rand     -67584      9900    -50568   -162792     70992
    
    dar sort    -335768   -162792    -69377    -67584    -50568
    dar sort      -9576         0      3128      4095      9900
    dar sort      70992     79300     81528    115776    185484
    
     sar lex    -162792   -335768    -50568    -67584    -69377
     sar lex      -9576         0    115776    185484      3128
     sar lex       4095     70992     79300     81528      9900
    
    lar s log    -162792   -335768    -50568    -67584    -69377
    lar s log      -9576         0    115776    185484      3128
    lar s log       4095     70992     79300     81528      9900
    
    lar a log          0    115776   -162792    185484      3128
    lar a log    -335768      4095    -50568    -67584    -69377
    lar a log      70992     79300     81528     -9576      9900
    

    Java code is shown below.

    // Code for "How can I sort numbers lexicographically?" - jw - 2 Jul 2014
    import java.util.Random;
    import java.util.Comparator;
    import java.lang.Math;
    import java.util.Arrays;
    public class lex882954 {
    // Comparator from Jason Cohen's answer
        public static Comparator<Integer> lexCompare = new Comparator<Integer>(){
            public int compare( Integer x, Integer y ) {
                return x.toString().compareTo( y.toString() );
            }
        };
    // Comparator that uses "abs." logarithms of numbers instead of strings
        public static Comparator<Integer> alogCompare = new Comparator<Integer>(){
            public int compare( Integer x, Integer y ) {
                Double xl = (x==0)? 0 : Math.log10(Math.abs(x));
                Double yl = (y==0)? 0 : Math.log10(Math.abs(y));
                Double xf=xl-xl.intValue();
                return xf.compareTo(yl-yl.intValue());
            }
        };
    // Comparator that uses "signed" logarithms of numbers instead of strings
        public static Comparator<Integer> slogCompare = new Comparator<Integer>(){
            public int compare( Integer x, Integer y ) {
                Double xl = (x==0)? 0 : Math.log10(Math.abs(x));
                Double yl = (y==0)? 0 : Math.log10(Math.abs(y));
                Double xf=xl-xl.intValue()+Integer.signum(x);
                return xf.compareTo(yl-yl.intValue()+Integer.signum(y));
            }
        };
    // Print array before or after sorting
        public static void printArr(Integer[] ar, int asize, String aname) {
            int j;
            for(j=0; j < asize; ++j) {
                if (j%5==0)
                    System.out.printf("%n%8s ", aname);
                System.out.printf(" %9d", ar[j]);
            }
            System.out.println();
        }
    // Main Program -- to test comparators
        public static void main(String[] args) {
            int j, dasize=15, hir=99;
            Random rnd = new Random(12345);
            Integer[] dar = new Integer[dasize];
            Integer[] sar = new Integer[dasize];
            Integer[] lara = new Integer[dasize];
            Integer[] lars = new Integer[dasize];
    
            for(j=0; j < dasize; ++j) {
                lara[j] = lars[j] = sar[j] = dar[j] = rnd.nextInt(hir) * 
                    rnd.nextInt(hir) * (rnd.nextInt(hir)-44);
            }
            printArr(dar, dasize, "dar rand");
            Arrays.sort(dar);
            printArr(dar, dasize, "dar sort");
            Arrays.sort(sar, lexCompare);
            printArr(sar, dasize, "sar lex");
            Arrays.sort(lars, slogCompare);
            printArr(lars, dasize, "lar s log");
            Arrays.sort(lara, alogCompare);
            printArr(lara, dasize, "lar a log");
        }
    }
    
    0 讨论(0)
  • 2020-12-09 05:38

    Pseudocode:

    sub sort_numbers_lexicographically (array) {
        for 0 <= i < array.length:
            array[i] = munge(array[i]);
        sort(array);  // using usual numeric comparisons
        for 0 <= i < array.length:
            array[i] = unmunge(array[i]);
    }
    

    So, what are munge and unmunge?

    munge is different depending on the integer size. For example:

    sub munge (4-bit-unsigned-integer n) {
        switch (n):
            case 0:  return 0
            case 1:  return 1
            case 2:  return 8
            case 3:  return 9
            case 4:  return 10
            case 5:  return 11
            case 6:  return 12
            case 7:  return 13
            case 8:  return 14
            case 9:  return 15
            case 10:  return 2
            case 11:  return 3
            case 12:  return 4
            case 13:  return 5
            case 14:  return 6
            case 15:  return 7
    }
    

    Esentially what munge is doing is saying what order 4 bit integers come in when sorted lexigraphically. I'm sure you can see that there is a pattern here --- I didn't have to use a switch --- and that you can write a version of munge that handles 32 bit integers reasonably easily. Think about how you would write versions of munge for 5, 6, and 7 bit integers if you can't immediately see the pattern.

    unmunge is the inverse of munge.

    So you can avoid converting anything to a string --- you don't need any extra memory.

    0 讨论(0)
  • 2020-12-09 05:38

    If you're going for space-wise efficiency, I'd try just doing the work in the comparison function of the sort

    int compare(int a, int b) {
       // convert a to string
       // convert b to string
       // return -1 if a < b, 0 if they are equal, 1 if a > b
    }
    

    If it's too slow (it's slower than preprocessing, for sure), keep track of the conversions somewhere so that the comparison function doesn't keep having to do them.

    0 讨论(0)
  • 2020-12-09 05:41

    Executable pseudo-code (aka Python): thenumbers.sort(key=str). Yeah, I know that using Python is kind of like cheating -- it's just too powerful;-). But seriously, this also means: if you can sort an array of strings lexicographically, as Python's sort intrinsically can, then just make the "key string" out of each number and sort that auxiliary array (you can then reconstruct the desired numbers array by a str->int transformation, or by doing the sort on the indices via indirection, etc etc); this is known as DSU (Decorate, Sort, Undecorate) and it's what the key= argument to Python's sort implements.

    In more detail (pseudocode):

    1. allocate an array of char** aux as long as the numbers array
    2. for i from 0 to length of numbers-1, aux[i]=stringify(numbers[i])
    3. allocate an array of int indices of the same length
    4. for i from 0 to length of numbers-1, indices[i]=i
    5. sort indices, using as cmp(i,j) strcmp(aux[i],aux[j])
    6. allocate an array of int results of the same length
    7. for i from 0 to length of numbers-1, results[i]=numbers[indices[i]]
    8. memcpy results over numbers
    9. free every aux[i], and also aux, indices, results
    0 讨论(0)
  • 2020-12-09 05:42

    Possible optimization: Instead of this:

    I converted each integer to its string format, then added zeros to its right to make all the integers contain the same number of digits

    you can multiply each number by (10^N - log10(number)), N being a number larger than log10 of any of your numbers.

    0 讨论(0)
提交回复
热议问题