64 bit large mallocs | 易学教程

问题

What are the reasons a malloc() would fail, especially in 64 bit?

My specific problem is trying to malloc a huge 10GB chunk of RAM on a 64 bit system. The machine has 12GB of RAM, and 32 GB of swap. Yes, the malloc is extreme, but why would it be a problem? This is in Windows XP64 with both Intel and MSFT compilers. The malloc sometimes succeeds, sometimes doesn\'t, about 50%. 8GB mallocs always work, 20GB mallocs always fail. If a malloc fails, repeated requests won\'t work, unless I quit the process and start a fresh process again (which will then have the 50% shot at success). No other big apps are running. It happens even immediately after a fresh reboot.

I could imagine a malloc failing in 32 bit if you have used up the 32 (or 31) bits of address space available, such that there\'s no address range large enough to assign to your request.

I could also imagine malloc failing if you have used up your physical RAM and your hard drive swap space. This isn\'t the case for me.

But why else could a malloc fail? I can\'t think of other reasons.

I\'m more interested in the general malloc question than my specific example, which I\'ll likely replace with memory mapped files anyway. The failed malloc() is just more of a puzzle than anything else... that desire to understand your tools and not be surprised by the fundamentals.

回答1:

malloc tries to allocate a contiguous memory range, and this will initially be in real memory simply due to how swap memory works (at least as far as I remember). It could easily be that your OS sometimes can't find a contiguous block of 10gb of memory and still leave all the processes that require real memory in RAM at the same time (at which point your malloc will fail).

Do you actually require 10gb of contiguous memory, or would you be able to wrap a storage class/struct around several smaller blocks and use your memory in chunks instead? This relaxes the huge contiguous requirement and should also allow your program to use the swap file for less used chunks.

回答2:

Have you tried using VirtualAlloc() and VirtualFree() directly? This may help isolate the problem.

You'll be bypassing the C runtime heap and the NT heap.
You can reserve virtual address space and then commit it. This will tell you which operation fails.

If the virtual address space reservation fails (even though it shouldn't, judging from what you've said), Sysinternals VMMap may help explain why. Turn on "Show free regions" to look at how the free virtual address space is fragmented.

回答3:

Just a guess here, but malloc allocates contiguous memory and you may not have a sufficiently large contiguous section on your heap. Here's a few things I would try;

Where a 20GB malloc fails, do four 5GB mallocs succeed? If so, it is a contiguous space issue.

Have you checked your compiler switches for anything that limits total heap size, or largest heap block size?

Have you tried writing a program that declares a static variable of the required size? If this works you could implement your own heap with big mallocs in that space.

回答4:

Have you tried using heap functions to allocate your memory instead?

回答5:

Here's an official source that states the maximum request size of the heap is defined by your linked CRT library (aside from your previous code having integer overflows going to 0 which is why you didn't get NULL back) (_HEAP_MAXREQ).

http://msdn.microsoft.com/en-us/library/6ewkz86d.aspx

Check out my answer here for large windows allocations, I include a reference to a MS paper on Vista/2008 memory model advancements.

In short, the stock CRT does not support, even for a native 64 bit process any heap size larger than 4gb. You have to use VirtualAlloc* or CreateFileMapping or some other analogues.

Oh I also noticed you are claiming that your larger allocations are actually succeeding, this is actually incorrect, you are mis-interpreting the malloc(0x200000000); (that's 8gb in hex), what is happening is you are requesting a 0 byte allocation due to a cast or some other effect of your test harness, you are most definitely not observing any thing larger than a 0xfffff000 bytes heap being committed, it is simply you are seeing integer overflows down casting.

WORD TO THE WYSE or * TIPS TO SAVE YOUR HEAP SANITY*

THE ONLY WAY TO ALLOCATE MEMORY WITH MALLOC (OR ANY OTHER DYNAMIC REQUEST)

void *foo = malloc(SIZE);

THE VALUE OF A DYNAMIC MEMORY REQUEST MUST NEVER (I CAN NOT STRESS THAT ENOUGH) BE CALCULATED WITHIN THE "()" PAREN'S OF THE REQUEST

mytype *foo = (mytype *) malloc(sizeof(mytype) * 2);

The danger is that an integer overflow will occur.

It is always a coding ERROR to perform arithmetic at the time of the call, you MUST ALWAYS calculate the TOTAL SUM of data to be requested before the statement which evaluates the request.

Why is it so bad? We know this is a mistake, because the point at which a request is made for dynamic resources, there must be a point in the future where we will use this resource.

To use what we have requested we must know how large it is ? (e.g. the array count, the type size, etc..).

This would mean, if we ever see any arithmetic at all, inside the () of a resource request, it is an error as we MUST duplicate that code again in order to use that data appropriately.

回答6:

The problem is that Visual Studio does not define WIN64 when you compile a 64 bit application, it usually still keeps WIN32, which is wrong for 64 bit apps. This then causes the run-time to use the 32-bit value when _HEAP_MAXREQ is defined, so all large malloc() will fail. If you change your project (under project properties, preprocessed definitions) to be WIN64, then the very large malloc() should have no trouble at all.

回答7:

But why else could a malloc fail? I can't think of other reasons

As implicitly stated previously several times, because of memory fragmentation

回答8:

I found the question interesting so I tried to research it, from a theoretical POV:

In 64-bit (actually 48-bit usable due to chip limitations, and less (44 bits?) due to OS limitations) you should certainly should not be limited by virtual memory fragmentation, i.e. lack of contiguous virtual address space. The reason is there is just so much virtual address space that it is quite impractical to exhaust it.

Also, we can expect that physical memory fragmentation should not be an issue, as virtual memory means there doesn't need to be a contiguous physical memory address range in order to satisfy an allocation request. Instead it can be satisfied with any sufficiently large set of memory pages.

So you must be running into something else: o.e. some other limitation that applies to virtual memory.

One other limit which definitely exists on Windows is the commit limit. More information on this:

http://blogs.technet.com/b/markrussinovich/archive/2008/11/17/3155406.aspx

Other possible limits could exist, e.g. quirk of how the actual implementation has to work with the actual hardware. Imagine that when trying to create a mapping of virtual address space to physical address space you run out of entries in the page table to do the virtual address mapping... does the OS memory allocator code care to handle this unlikely scenario? Perhaps not...

You can read more information on how page tables actually work to do virtual address translation here:

http://en.wikipedia.org/wiki/Memory_management_unit

回答9:

It is most likely fragmentation. For simplicity, let's use an example.

The memory consists of a single 12kb module. This memory is organised into 1kb blocks in the MMU. So, you have 12 x 1kb blocks. Your OS uses 100 bytes but this is basically the code that manages the page tables. So, you cannot swap it out. Then, your apps all use 100 bytes each.

Now, with just your OS and your application running (200 bytes), you would already be using 200 bytes of memory (occupying 2kb blocks). Leaving exactly 10kb available for malloc().

Now, you started by malloc() a couple of buffers - A (900 byte), B (200 byte). Then, you free up A. Now, you have 9.8kb free (non-contiguous). So, you try to malloc() C (9kb). Suddenly, you fail.

You have 8.9k contiguous at the tail end and 0.9k at the front end. You cannot re-map the first block to the end because B stretches over the first 1k and the second 1k blocks.

You can still malloc() a single 8kb block.

Granted, this example is a little contrived, but hope it helps.

来源：https://stackoverflow.com/questions/833234/64-bit-large-mallocs

标签

windows

memory

64-bit

malloc

virtual-memory