We\'ve occasionally been getting problems whereby our long-running server processes (running on Windows Server 2003) have thrown an exception due to a memory allocation fail
I think you’ve mistakenly ruled out a memory leak too early. Even a tiny memory leak can cause a severe memory fragmentation.
Assuming your application behaves like the following:
Allocate 10MB
Allocate 1 byte
Free 10MB
(oops, we didn’t free the 1 byte, but who cares about 1 tiny byte)
This seems like a very small leak, you will hardly notice it when monitoring just the total allocated memory size.
But this leak eventually will cause your application memory to look like this:
.
.
Free – 10MB
.
.
[Allocated -1 byte]
.
.
Free – 10MB
.
.
[Allocated -1 byte]
.
.
Free – 10MB
.
.
This leak will not be noticed... until you want to allocate 11MB
Assuming your minidumps had full memory info included, I recommend using DebugDiag to spot possible leaks.
In the generated memory report, examine carefully the allocation count (not size).
I'd suspect a leak before suspecting fragmentation.
For the memory-intensive data structures, you could switch over to a re-usable storage pool mechanism. You might also be able to allocate more stuff on the stack as opposed to the heap, but in practical terms that won't make a huge difference I think.
I'd fire up a tool like valgrind or do some intensive logging to look for resources not being released.
The problem does happen on Unix, although it's usually not as bad.
The Low-framgmentation heap helped us, but my co-workers swear by Smart Heap (it's been used cross platform in a couple of our products for years). Unfortunately due to other circumstances we couldn't use Smart Heap this time.
We also look at block/chunking allocating and trying to have scope-savvy pools/strategies, i.e., long term things here, whole request thing there, short term things over there, etc.
You can help reduce fragmentation by reducing the amount you allocate deallocate.
e.g. say for a web server running a server side script, it may create a string to output the page to. Instead of allocating and deallocating these strings for every page request, just maintain a pool of them, so your only allocating when you need more, but your not deallocating (meaning after a while you get the situation you not allocating anymore either, because you have enough)
You can use _CrtDumpMemoryLeaks(); to dump memory leaks to the debug window when running a debug build, however I believe this is specific to the Visual C compiler. (it's in crtdbg.h)