问题
I want to sort on the order of four million long long
s in C. Normally I would just malloc()
a buffer to use as an array and call qsort()
but four million * 8 bytes is one huge chunk of contiguous memory.
What's the easiest way to do this? I rate ease over pure speed for this. I'd prefer not to use any libraries and the result will need to run on a modest netbook under both Windows and Linux.
回答1:
Just allocate a buffer and call qsort
. 32MB isn't so very big these days even on a modest netbook.
If you really must split it up: sort smaller chunks, write them to files, and merge them (a merge takes a single linear pass over each of the things being merged). But, really, don't. Just sort it.
(There's a good discussion of the sort-and-merge approach in volume 2 of Knuth, where it's called "external sorting". When Knuth was writing that, the external data would have been on magnetic tape, but the principles aren't very different with discs: you still want your I/O to be as sequential as possible. The tradeoffs are a bit different with SSDs.)
回答2:
32 MB? thats not too big.... quicksort should do the trick.
回答3:
Your best option would be to prevent having the data unordered if possible. Like it has been mentioned, you'd be better of reading the data from disk (or network or whatever the source) directly into a selforganizing container (a tree, perhaps std::set
will do).
That way, you'll never have to sort through the lot, or have to worry about memory management. If you know the required capacity of the container, you might squeeze out additional performance by using std::vector(initialcapacity)
or call vector::reserve
up front.
You'd then best be advised to use std::make_heap
to heapify any existing elements, and then add element by element using push_heap
(see also pop_heap
). This essentially is the same paradigm as the self-ordering set but
- duplicates are ok
- the storage is 'optimized' as a flat array (which is perfect for e.g. shared memory maps or memory mapped files)
(Oh, minor detail, note that sort_heap
on the heap takes at most N log N comparisons, where N is the number of elements)
Let me know if you think this is an interesting approach. I'd really need a bit more info on the use case
来源:https://stackoverflow.com/questions/5588041/how-to-sort-a-very-large-array-in-c