Here is the scenario.
I am given an array \'A\' of integers. The size of the array is not fixed. The function that I am supposed to write may be called once with an
If you want to try a better preprocess-sort-postprocess, then note that an int is at most 10 decimal digits (ignoring signed-ness for the time being).
So the binary-coded-decimal data for it fits in 64 bits. Map digit 0->1, 1->2 etc, and use 0 as a NUL terminator (to ensure that "1" comes out less than "10"). Shift each digit in turn, starting with the smallest, into the top of a long. Sort the longs, which will come out in lexicographical order for the original ints. Then convert back by shifting digits one at a time back out of the top of each long:
uint64_t munge(uint32_t i) {
uint64_t acc = 0;
while (i > 0) {
acc = acc >> 4;
uint64_t digit = (i % 10) + 1;
acc += (digit << 60);
i /= 10;
}
return acc;
}
uint32_t demunge(uint64_t l) {
uint32_t acc = 0;
while (l > 0) {
acc *= 10;
uint32_t digit = (l >> 60) - 1;
acc += digit;
l << 4;
}
}
Or something like that. Since Java doesn't have unsigned ints, you'd have to modify it a little. It uses a lot of working memory (twice the size of the input), but that's still less than your initial approach. It might be faster than converting to strings on the fly in the comparator, but it uses more peak memory. Depending on the GC, it might churn its way through less memory total, though, and require less collection.
The question doesn't indicate how to treat negative integers in the lexicographic collating order. The string-based methods presented earlier typically will sort negative values to the front; eg, { -123, -345, 0, 234, 78 } would be left in that order. But if the minus signs were supposed to be ignored, the output order should be { 0, -123, 234, -345, 78 }. One could adapt a string-based method to produce that order by somewhat-cumbersome additional tests.
It may be simpler, in both theory and code, to use a comparator that compares fractional parts of common logarithms of two integers. That is, it will compare the mantissas of base 10 logarithms of two numbers. A logarithm-based comparator will run faster or slower than a string-based comparator, depending on a CPU's floating-point performance specs and on quality of implementations.
The java code shown at the end of this answer includes two logarithm-based comparators: alogCompare
and slogCompare
. The former ignores signs, so would produce { 0, -123, 234, -345, 78 } from { -123, -345, 0, 234, 78 }.
The number-groups shown next are the output produced by the java program.
The “dar rand” section shows a random-data array dar
as generated. It reads across and then down, 5 elements per line. Note, arrays sar
, lara
, and lars
initially are unsorted copies of dar
.
The “dar sort” section is dar
after sorting via Arrays.sort(dar);
.
The “sar lex” section shows array sar
after sorting with Arrays.sort(sar,lexCompare);
, where lexCompare
is similar to the Comparator
shown in Jason Cohen's answer.
The “lar s log” section shows array lars
after sorting by Arrays.sort(lars,slogCompare);
, illustrating a logarithm-based method that gives the same order as do lexCompare
and other string-based methods.
The “lar a log” section shows array lara
after sorting by Arrays.sort(lara,alogCompare);
, illustrating a logarithm-based method that ignores minus signs.
dar rand -335768 115776 -9576 185484 81528
dar rand 79300 0 3128 4095 -69377
dar rand -67584 9900 -50568 -162792 70992
dar sort -335768 -162792 -69377 -67584 -50568
dar sort -9576 0 3128 4095 9900
dar sort 70992 79300 81528 115776 185484
sar lex -162792 -335768 -50568 -67584 -69377
sar lex -9576 0 115776 185484 3128
sar lex 4095 70992 79300 81528 9900
lar s log -162792 -335768 -50568 -67584 -69377
lar s log -9576 0 115776 185484 3128
lar s log 4095 70992 79300 81528 9900
lar a log 0 115776 -162792 185484 3128
lar a log -335768 4095 -50568 -67584 -69377
lar a log 70992 79300 81528 -9576 9900
Java code is shown below.
// Code for "How can I sort numbers lexicographically?" - jw - 2 Jul 2014
import java.util.Random;
import java.util.Comparator;
import java.lang.Math;
import java.util.Arrays;
public class lex882954 {
// Comparator from Jason Cohen's answer
public static Comparator<Integer> lexCompare = new Comparator<Integer>(){
public int compare( Integer x, Integer y ) {
return x.toString().compareTo( y.toString() );
}
};
// Comparator that uses "abs." logarithms of numbers instead of strings
public static Comparator<Integer> alogCompare = new Comparator<Integer>(){
public int compare( Integer x, Integer y ) {
Double xl = (x==0)? 0 : Math.log10(Math.abs(x));
Double yl = (y==0)? 0 : Math.log10(Math.abs(y));
Double xf=xl-xl.intValue();
return xf.compareTo(yl-yl.intValue());
}
};
// Comparator that uses "signed" logarithms of numbers instead of strings
public static Comparator<Integer> slogCompare = new Comparator<Integer>(){
public int compare( Integer x, Integer y ) {
Double xl = (x==0)? 0 : Math.log10(Math.abs(x));
Double yl = (y==0)? 0 : Math.log10(Math.abs(y));
Double xf=xl-xl.intValue()+Integer.signum(x);
return xf.compareTo(yl-yl.intValue()+Integer.signum(y));
}
};
// Print array before or after sorting
public static void printArr(Integer[] ar, int asize, String aname) {
int j;
for(j=0; j < asize; ++j) {
if (j%5==0)
System.out.printf("%n%8s ", aname);
System.out.printf(" %9d", ar[j]);
}
System.out.println();
}
// Main Program -- to test comparators
public static void main(String[] args) {
int j, dasize=15, hir=99;
Random rnd = new Random(12345);
Integer[] dar = new Integer[dasize];
Integer[] sar = new Integer[dasize];
Integer[] lara = new Integer[dasize];
Integer[] lars = new Integer[dasize];
for(j=0; j < dasize; ++j) {
lara[j] = lars[j] = sar[j] = dar[j] = rnd.nextInt(hir) *
rnd.nextInt(hir) * (rnd.nextInt(hir)-44);
}
printArr(dar, dasize, "dar rand");
Arrays.sort(dar);
printArr(dar, dasize, "dar sort");
Arrays.sort(sar, lexCompare);
printArr(sar, dasize, "sar lex");
Arrays.sort(lars, slogCompare);
printArr(lars, dasize, "lar s log");
Arrays.sort(lara, alogCompare);
printArr(lara, dasize, "lar a log");
}
}
Pseudocode:
sub sort_numbers_lexicographically (array) {
for 0 <= i < array.length:
array[i] = munge(array[i]);
sort(array); // using usual numeric comparisons
for 0 <= i < array.length:
array[i] = unmunge(array[i]);
}
So, what are munge
and unmunge
?
munge
is different depending on the integer size. For example:
sub munge (4-bit-unsigned-integer n) {
switch (n):
case 0: return 0
case 1: return 1
case 2: return 8
case 3: return 9
case 4: return 10
case 5: return 11
case 6: return 12
case 7: return 13
case 8: return 14
case 9: return 15
case 10: return 2
case 11: return 3
case 12: return 4
case 13: return 5
case 14: return 6
case 15: return 7
}
Esentially what munge is doing is saying what order 4 bit integers come in when sorted lexigraphically. I'm sure you can see that there is a pattern here --- I didn't have to use a switch --- and that you can write a version of munge
that handles 32 bit integers reasonably easily. Think about how you would write versions of munge
for 5, 6, and 7 bit integers if you can't immediately see the pattern.
unmunge
is the inverse of munge
.
So you can avoid converting anything to a string --- you don't need any extra memory.
If you're going for space-wise efficiency, I'd try just doing the work in the comparison function of the sort
int compare(int a, int b) {
// convert a to string
// convert b to string
// return -1 if a < b, 0 if they are equal, 1 if a > b
}
If it's too slow (it's slower than preprocessing, for sure), keep track of the conversions somewhere so that the comparison function doesn't keep having to do them.
Executable pseudo-code (aka Python): thenumbers.sort(key=str)
. Yeah, I know that using Python is kind of like cheating -- it's just too powerful;-). But seriously, this also means: if you can sort an array of strings lexicographically, as Python's sort intrinsically can, then just make the "key string" out of each number and sort that auxiliary array (you can then reconstruct the desired numbers array by a str->int transformation, or by doing the sort on the indices via indirection, etc etc); this is known as DSU (Decorate, Sort, Undecorate) and it's what the key=
argument to Python's sort implements.
In more detail (pseudocode):
aux
as long as the numbers
arraylength of numbers-1
, aux[i]=stringify(numbers[i])
indices
of the same lengthlength of numbers-1
, indices[i]=i
indices
, using as cmp(i,j)
strcmp(aux[i],aux[j])
results
of the same lengthlength of numbers-1
, results[i]=numbers[indices[i]]
results
over numbers
aux[i]
, and also aux
, indices
, results
Possible optimization: Instead of this:
I converted each integer to its string format, then added zeros to its right to make all the integers contain the same number of digits
you can multiply each number by (10^N - log10(number)), N being a number larger than log10 of any of your numbers.