How to sort a very large array in C

喜欢而已 提交于 2019-12-13 03:37:41

问题


I want to sort on the order of four million long longs in C. Normally I would just malloc() a buffer to use as an array and call qsort() but four million * 8 bytes is one huge chunk of contiguous memory.

What's the easiest way to do this? I rate ease over pure speed for this. I'd prefer not to use any libraries and the result will need to run on a modest netbook under both Windows and Linux.


回答1:


Just allocate a buffer and call qsort. 32MB isn't so very big these days even on a modest netbook.

If you really must split it up: sort smaller chunks, write them to files, and merge them (a merge takes a single linear pass over each of the things being merged). But, really, don't. Just sort it.

(There's a good discussion of the sort-and-merge approach in volume 2 of Knuth, where it's called "external sorting". When Knuth was writing that, the external data would have been on magnetic tape, but the principles aren't very different with discs: you still want your I/O to be as sequential as possible. The tradeoffs are a bit different with SSDs.)




回答2:


32 MB? thats not too big.... quicksort should do the trick.




回答3:


Your best option would be to prevent having the data unordered if possible. Like it has been mentioned, you'd be better of reading the data from disk (or network or whatever the source) directly into a selforganizing container (a tree, perhaps std::set will do).

That way, you'll never have to sort through the lot, or have to worry about memory management. If you know the required capacity of the container, you might squeeze out additional performance by using std::vector(initialcapacity) or call vector::reserve up front.

You'd then best be advised to use std::make_heap to heapify any existing elements, and then add element by element using push_heap (see also pop_heap). This essentially is the same paradigm as the self-ordering set but

  • duplicates are ok
  • the storage is 'optimized' as a flat array (which is perfect for e.g. shared memory maps or memory mapped files)

(Oh, minor detail, note that sort_heap on the heap takes at most N log N comparisons, where N is the number of elements)

Let me know if you think this is an interesting approach. I'd really need a bit more info on the use case



来源:https://stackoverflow.com/questions/5588041/how-to-sort-a-very-large-array-in-c

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!