This is a long text. Please bear with me. Boiled down, the question is: Is there a workable in-place radix sort algorithm?
If your data set is so big, then I would think that a disk-based buffer approach would be best:
sort(List elements, int prefix)
if (elements.Count < THRESHOLD)
return InMemoryRadixSort(elements, prefix)
else
return DiskBackedRadixSort(elements, prefix)
DiskBackedRadixSort(elements, prefix)
DiskBackedBuffer[] buckets
foreach (element in elements)
buckets[element.MSB(prefix)].Add(element);
List ret
foreach (bucket in buckets)
ret.Add(sort(bucket, prefix + 1))
return ret
I would also experiment grouping into a larger number of buckets, for instance, if your string was:
GATTACA
the first MSB call would return the bucket for GATT (256 total buckets), that way you make fewer branches of the disk based buffer. This may or may not improve performance, so experiment with it.