Immediate detection of heap corruption errors on Windows. How?

问题

I can't sleep! :)

I have a reasonably large project on Windows and encountered some heap corruption issues. I have read all SO, including this nice topic: How to debug heap corruption errors?, however nothing was suitable to help me out-of-the-box. Debug CRT and BoundsChecker detected heap corruptions, but addresses were always different and detections point were always far away from the actual memory overwrites. I have not slept till the middle of the night and crafted the following hack:

DWORD PageSize = 0;

inline void SetPageSize()
{
    if ( !PageSize )
    {
        SYSTEM_INFO sysInfo;
        GetSystemInfo(&sysInfo);
        PageSize = sysInfo.dwPageSize;
    }
}

void* operator new (size_t nSize)
{
    SetPageSize();
    size_t Extra = nSize % PageSize;
    nSize = nSize + ( PageSize - Extra );
    return Ptr = VirtualAlloc( 0, nSize, MEM_COMMIT, PAGE_READWRITE);
}

void operator delete (void* pPtr)
{
    MEMORY_BASIC_INFORMATION mbi;
    VirtualQuery(pPtr, &mbi, sizeof(mbi));
    // leave pages in reserved state, but free the physical memory
    VirtualFree(pPtr, 0, MEM_DECOMMIT);
    DWORD OldProtect;
    // protect the address space, so noone can access those pages
    VirtualProtect(pPtr, mbi.RegionSize, PAGE_NOACCESS, &OldProtect);
}

Some heap corruption errors became obvious and i was able to fix them. There were no more Debug CRT warnings on exit. However, i have some questions regarding this hack:

1. Can it produce any false positives?

2. Can it miss some of the heap corruptions? (even if we replace malloc/realloc/free?)

3. It fails to run on 32-bits with OUT_OF_MEMORY, only on 64-bits. Am I right we simply run out of the virtual address space on 32-bits?

回答1:

Can it produce any false positives?

So, this will only catch bugs of the class "use after free()". For that purpose, I think, it's reasonably good.

If you try to delete something that wasn't new'ed, that's a different type of bug. In delete you should first check if the memory has been indeed allocated. You shouldn't be blindly freeing the memory and marking it as inaccessible. I'd try to avoid that and report (by, say, doing a debug break) when there's an attempt to delete something that shouldn't be deleted because it was never new'ed.

Can it miss some of the heap corruptions? (even if we replace malloc/realloc/free?)

Obviously, this won't catch all corruptions of heap data between new and and the respective delete. It will only catch those attempted after delete.

E.g.:

myObj* = new MyObj(1,2,3);
// corruption of *myObj happens here and may go unnoticed
delete myObj;

It fails to run on 32-bit target with OUT_OF_MEMORY error, only on 64-bit. Am I right that we simply run out of the virtual address space on 32-bits?

Typically you have available about ~2GB of the virtual address space on a 32-bit Windows. That's good for at most ~524288 new's like in the provided code. But with objects bigger than 4KB, you'll be able to successfully allocate fewer instances than that. And then address space fragmentation will reduce that number further.

It's a perfectly expected outcome if you create many object instances during the life cycle of your program.

回答2:

This won't catch:

use of uninitialized memory (once your pointer is allocated, you can read garbage from it at will)
buffer overruns (unless you overrun the PageSize boundary)

Ideally, you should write a well-known bit pattern before and after your allocated blocks, so that operator delete can check whether they were overwritten (indicated buffer over- or under-run).

Currently this would be allowed silently in your scheme, and switching back to malloc etc. would allow it to silently damage the heap, and show up as an error later on (eg. when freeing the block after the over-run one).

You can't catch everything though: note for example that if the underlying problem is (valid) pointer somewhere getting overwritten with garbage, you can't detect this until the damaged pointer is de-referenced.

回答3:

Yes, your current answer can miss heap corruptions of buffer under- and overruns.
Your delete() function is pretty good!
I implemented a new() function in similar manner, that adds guard pages both for under- and overruns.
From GFlags documentation I conclude that it protects only against overruns.

Note that when returning simply a pointer next to the underrun guard page then guard page for overruns is likely to be located away from the allocated object and immediate vicinity after the allocated object is NOT guarded.
To compensate for this one would need to return such a pointer that the object is located immediately before overrun guard page (in this case again an underrun is less likely to be detected).
The below code does one or the other alternately for each call of new(). Or one might want to modify it to use threadsafe random generator instead to prevent any interferences with code calling the new().
Considering all this one should be aware that detecting under- and overruns by the below code is still probabilistic to a degree - this is especially relevant in the case when some objects are allocated only once for the entire duration of the program.

NB! Because new() returns a modified aadress, the delete() function also had to be adjusted a bit, so it now uses mbi.AllocationBase instead of ptr for VirtualFree() and VirtualProtect().

PS. Driver Verifier's Special Pool uses similar tricks.

volatile LONG priorityForUnderrun = rand(); //NB! init with rand so that the pattern is different across program runs and different checks are applied to global singleton objects

void ProtectMemRegion(void* region_ptr, size_t sizeWithGuardPages)
{
    size_t preRegionGuardPageAddress = (size_t)region_ptr;
    size_t postRegionGuardPageAddress = (size_t)(region_ptr) + sizeWithGuardPages - PageSize;   

    DWORD flOldProtect1;
    BOOL preRegionProtectSuccess = VirtualProtect(
        (void*)(preRegionGuardPageAddress),
        pageSize,
        PAGE_NOACCESS,
        &flOldProtect1  
    );

    DWORD flOldProtect2;
    BOOL postRegionProtectSuccess = VirtualProtect(
        (void*)(postRegionGuardPageAddress),
        PageSize,
        PAGE_NOACCESS,
        &flOldProtect2  
    );
}   

void* operator new (size_t size)
{
    size_t sizeWithGuardPages = (size + PageSize - 1) / PageSize * PageSize + 2 * PageSize;

    void* ptr = VirtualAlloc
    (
        NULL,
        sizeWithGuardPages,
        MEM_COMMIT | MEM_RESERVE,
        PAGE_READWRITE
    );

    if (ptr == NULL)    //NB! check for allocation failures
    {
        return NULL;
    }

    ProtectMemRegion(ptr, sizeWithGuardPages);

    void* result;
    if (InterlockedIncrement(&priorityForUnderrun) % 2)
        result = (void*)((size_t)(ptr) + pageSize);
    else 
        result = (void*)(((size_t)(ptr) + sizeWithGuardPages - pageSize - size) / sizeof(size_t) * sizeof(size_t)); 

    return result;
}   

void operator delete (void* ptr) 
{
    MEMORY_BASIC_INFORMATION mbi;
    DWORD OldProtect;

    VirtualQuery(ptr, &mbi, sizeof(mbi));
    // leave pages in reserved state, but free the physical memory
    VirtualFree(mbi.AllocationBase, 0, MEM_DECOMMIT);
    // protect the address space, so noone can access those pages
    VirtualProtect(mbi.AllocationBase, mbi.RegionSize, PAGE_NOACCESS, &OldProtect);
}

来源：https://stackoverflow.com/questions/12724057/immediate-detection-of-heap-corruption-errors-on-windows-how

标签

c++

heap-memory

heap-corruption

virtualalloc