How can I sort numbers lexicographically?

前端未结

关注

 14  932

Here is the scenario.

I am given an array \'A\' of integers. The size of the array is not fixed. The function that I am supposed to write may be called once with an

相关标签:

14条回答

失恋的感觉

2020-12-09 05:36
If you want to try a better preprocess-sort-postprocess, then note that an int is at most 10 decimal digits (ignoring signed-ness for the time being).

So the binary-coded-decimal data for it fits in 64 bits. Map digit 0->1, 1->2 etc, and use 0 as a NUL terminator (to ensure that "1" comes out less than "10"). Shift each digit in turn, starting with the smallest, into the top of a long. Sort the longs, which will come out in lexicographical order for the original ints. Then convert back by shifting digits one at a time back out of the top of each long:
```
uint64_t munge(uint32_t i) {
    uint64_t acc = 0;
    while (i > 0) {
        acc = acc >> 4;
        uint64_t digit = (i % 10) + 1;
        acc += (digit << 60);
        i /= 10;
    }
    return acc;
}

uint32_t demunge(uint64_t l) {
    uint32_t acc = 0;
    while (l > 0) {
        acc *= 10;
        uint32_t digit = (l >> 60) - 1;
        acc += digit;
        l << 4;
    }
}
```
Or something like that. Since Java doesn't have unsigned ints, you'd have to modify it a little. It uses a lot of working memory (twice the size of the input), but that's still less than your initial approach. It might be faster than converting to strings on the fly in the comparator, but it uses more peak memory. Depending on the GC, it might churn its way through less memory total, though, and require less collection.
0 讨论(0)
发布评论:

提交评论
- 加载中...

情歌与酒

2020-12-09 05:36

The question doesn't indicate how to treat negative integers in the lexicographic collating order. The string-based methods presented earlier typically will sort negative values to the front; eg, { -123, -345, 0, 234, 78 } would be left in that order. But if the minus signs were supposed to be ignored, the output order should be { 0, -123, 234, -345, 78 }. One could adapt a string-based method to produce that order by somewhat-cumbersome additional tests.

It may be simpler, in both theory and code, to use a comparator that compares fractional parts of common logarithms of two integers. That is, it will compare the mantissas of base 10 logarithms of two numbers. A logarithm-based comparator will run faster or slower than a string-based comparator, depending on a CPU's floating-point performance specs and on quality of implementations.

The java code shown at the end of this answer includes two logarithm-based comparators: alogCompare and slogCompare. The former ignores signs, so would produce { 0, -123, 234, -345, 78 } from { -123, -345, 0, 234, 78 }.

The number-groups shown next are the output produced by the java program.

The “dar rand” section shows a random-data array dar as generated. It reads across and then down, 5 elements per line. Note, arrays sar, lara, and lars initially are unsorted copies of dar.

The “dar sort” section is dar after sorting via Arrays.sort(dar);.

The “sar lex” section shows array sar after sorting with Arrays.sort(sar,lexCompare);, where lexCompare is similar to the Comparator shown in Jason Cohen's answer.

The “lar s log” section shows array lars after sorting by Arrays.sort(lars,slogCompare);, illustrating a logarithm-based method that gives the same order as do lexCompare and other string-based methods.

The “lar a log” section shows array lara after sorting by Arrays.sort(lara,alogCompare);, illustrating a logarithm-based method that ignores minus signs.

dar rand    -335768    115776     -9576    185484     81528
dar rand      79300         0      3128      4095    -69377
dar rand     -67584      9900    -50568   -162792     70992

dar sort    -335768   -162792    -69377    -67584    -50568
dar sort      -9576         0      3128      4095      9900
dar sort      70992     79300     81528    115776    185484

 sar lex    -162792   -335768    -50568    -67584    -69377
 sar lex      -9576         0    115776    185484      3128
 sar lex       4095     70992     79300     81528      9900

lar s log    -162792   -335768    -50568    -67584    -69377
lar s log      -9576         0    115776    185484      3128
lar s log       4095     70992     79300     81528      9900

lar a log          0    115776   -162792    185484      3128
lar a log    -335768      4095    -50568    -67584    -69377
lar a log      70992     79300     81528     -9576      9900

Java code is shown below.

// Code for "How can I sort numbers lexicographically?" - jw - 2 Jul 2014
import java.util.Random;
import java.util.Comparator;
import java.lang.Math;
import java.util.Arrays;
public class lex882954 {
// Comparator from Jason Cohen's answer
    public static Comparator<Integer> lexCompare = new Comparator<Integer>(){
        public int compare( Integer x, Integer y ) {
            return x.toString().compareTo( y.toString() );
        }
    };
// Comparator that uses "abs." logarithms of numbers instead of strings
    public static Comparator<Integer> alogCompare = new Comparator<Integer>(){
        public int compare( Integer x, Integer y ) {
            Double xl = (x==0)? 0 : Math.log10(Math.abs(x));
            Double yl = (y==0)? 0 : Math.log10(Math.abs(y));
            Double xf=xl-xl.intValue();
            return xf.compareTo(yl-yl.intValue());
        }
    };
// Comparator that uses "signed" logarithms of numbers instead of strings
    public static Comparator<Integer> slogCompare = new Comparator<Integer>(){
        public int compare( Integer x, Integer y ) {
            Double xl = (x==0)? 0 : Math.log10(Math.abs(x));
            Double yl = (y==0)? 0 : Math.log10(Math.abs(y));
            Double xf=xl-xl.intValue()+Integer.signum(x);
            return xf.compareTo(yl-yl.intValue()+Integer.signum(y));
        }
    };
// Print array before or after sorting
    public static void printArr(Integer[] ar, int asize, String aname) {
        int j;
        for(j=0; j < asize; ++j) {
            if (j%5==0)
                System.out.printf("%n%8s ", aname);
            System.out.printf(" %9d", ar[j]);
        }
        System.out.println();
    }
// Main Program -- to test comparators
    public static void main(String[] args) {
        int j, dasize=15, hir=99;
        Random rnd = new Random(12345);
        Integer[] dar = new Integer[dasize];
        Integer[] sar = new Integer[dasize];
        Integer[] lara = new Integer[dasize];
        Integer[] lars = new Integer[dasize];

        for(j=0; j < dasize; ++j) {
            lara[j] = lars[j] = sar[j] = dar[j] = rnd.nextInt(hir) * 
                rnd.nextInt(hir) * (rnd.nextInt(hir)-44);
        }
        printArr(dar, dasize, "dar rand");
        Arrays.sort(dar);
        printArr(dar, dasize, "dar sort");
        Arrays.sort(sar, lexCompare);
        printArr(sar, dasize, "sar lex");
        Arrays.sort(lars, slogCompare);
        printArr(lars, dasize, "lar s log");
        Arrays.sort(lara, alogCompare);
        printArr(lara, dasize, "lar a log");
    }
}

0 讨论(0)

面向向阳花

2020-12-09 05:38

Pseudocode:

sub sort_numbers_lexicographically (array) {
    for 0 <= i < array.length:
        array[i] = munge(array[i]);
    sort(array);  // using usual numeric comparisons
    for 0 <= i < array.length:
        array[i] = unmunge(array[i]);
}

So, what are munge and unmunge?

munge is different depending on the integer size. For example:

sub munge (4-bit-unsigned-integer n) {
    switch (n):
        case 0:  return 0
        case 1:  return 1
        case 2:  return 8
        case 3:  return 9
        case 4:  return 10
        case 5:  return 11
        case 6:  return 12
        case 7:  return 13
        case 8:  return 14
        case 9:  return 15
        case 10:  return 2
        case 11:  return 3
        case 12:  return 4
        case 13:  return 5
        case 14:  return 6
        case 15:  return 7
}

Esentially what munge is doing is saying what order 4 bit integers come in when sorted lexigraphically. I'm sure you can see that there is a pattern here --- I didn't have to use a switch --- and that you can write a version of munge that handles 32 bit integers reasonably easily. Think about how you would write versions of munge for 5, 6, and 7 bit integers if you can't immediately see the pattern.

unmunge is the inverse of munge.

So you can avoid converting anything to a string --- you don't need any extra memory.

0 讨论(0)

再見小時候

2020-12-09 05:38
If you're going for space-wise efficiency, I'd try just doing the work in the comparison function of the sort
```
int compare(int a, int b) {
   // convert a to string
   // convert b to string
   // return -1 if a < b, 0 if they are equal, 1 if a > b
}
```
If it's too slow (it's slower than preprocessing, for sure), keep track of the conversions somewhere so that the comparison function doesn't keep having to do them.
0 讨论(0)
发布评论:

提交评论
- 加载中...
执念已碎

2020-12-09 05:41
Executable pseudo-code (aka Python): thenumbers.sort(key=str). Yeah, I know that using Python is kind of like cheating -- it's just too powerful;-). But seriously, this also means: if you can sort an array of strings lexicographically, as Python's sort intrinsically can, then just make the "key string" out of each number and sort that auxiliary array (you can then reconstruct the desired numbers array by a str->int transformation, or by doing the sort on the indices via indirection, etc etc); this is known as DSU (Decorate, Sort, Undecorate) and it's what the key= argument to Python's sort implements.

In more detail (pseudocode):
1. allocate an array of char** aux as long as the numbers array
2. for i from 0 to length of numbers-1, aux[i]=stringify(numbers[i])
3. allocate an array of int indices of the same length
4. for i from 0 to length of numbers-1, indices[i]=i
5. sort indices, using as cmp(i,j) strcmp(aux[i],aux[j])
6. allocate an array of int results of the same length
7. for i from 0 to length of numbers-1, results[i]=numbers[indices[i]]
8. memcpy results over numbers
9. free every aux[i], and also aux, indices, results
0 讨论(0)
发布评论:

提交评论
- 加载中...
攒了一身酷

2020-12-09 05:42

Possible optimization: Instead of this:

I converted each integer to its string format, then added zeros to its right to make all the integers contain the same number of digits

you can multiply each number by (10^N - log10(number)), N being a number larger than log10 of any of your numbers.

0 讨论(0)
发布评论:

提交评论
- 加载中...

1 2 3 下一页