munmap() failure with ENOMEM with private anonymous mapping

喜夏-厌秋 提交于 2019-12-18 03:51:13

问题


I have recently discovered that Linux does not guarantee that memory allocated with mmap can be freed with munmap if this leads to situation when number of VMA (Virtual Memory Area) structures exceed vm.max_map_count. Manpage states this (almost) clearly:

 ENOMEM The process's maximum number of mappings would have been exceeded.
 This error can also occur for munmap(), when unmapping a region
 in the middle of an existing mapping, since this results in two
 smaller mappings on either side of the region being unmapped.

The problem is that Linux kernel always tries to merge VMA structures if possible, making munmap fail even for separately created mappings. I was able to write a small program to confirm this behavior:

#include <stdio.h>
#include <stdlib.h>
#include <errno.h>

#include <sys/mman.h>

// value of vm.max_map_count
#define VM_MAX_MAP_COUNT        (65530)

// number of vma for the empty process linked against libc - /proc/<id>/maps
#define VMA_PREMAPPED           (15)

#define VMA_SIZE                (4096)
#define VMA_COUNT               ((VM_MAX_MAP_COUNT - VMA_PREMAPPED) * 2)

int main(void)
{
    static void *vma[VMA_COUNT];

    for (int i = 0; i < VMA_COUNT; i++) {
        vma[i] = mmap(0, VMA_SIZE, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0);

        if (vma[i] == MAP_FAILED) {
            printf("mmap() failed at %d\n", i);
            return 1;
        }
    }

    for (int i = 0; i < VMA_COUNT; i += 2) {
        if (munmap(vma[i], VMA_SIZE) != 0) {
            printf("munmap() failed at %d (%p): %m\n", i, vma[i]);
        }
    }
}

It allocates a large number of pages (twice the default allowed maximum) using mmap, then munmaps every second page to create separate VMA structure for each remaining page. On my machine the last munmap call always fails with ENOMEM.

Initially I thought that munmap never fails if used with the same values for address and size that were used to create mapping. Apparently this is not the case on Linux and I was not able to find information about similar behavior on other systems.

At the same time in my opinion partial unmapping applied to the middle of a mapped region is expected to fail on any OS for every sane implementation, but I haven't found any documentation that says such failure is possible.

I would usually consider this a bug in the kernel, but knowing how Linux deals with memory overcommit and OOM I am almost sure this is a "feature" that exists to improve performance and decrease memory consumption.

Other information I was able to find:

  • Similar APIs on Windows do not have this "feature" due to their design (see MapViewOfFile, UnmapViewOfFile, VirtualAlloc, VirtualFree) - they simply do not support partial unmapping.
  • glibc malloc implementation does not create more than 65535 mappings, backing off to sbrk when this limit is reached: https://code.woboq.org/userspace/glibc/malloc/malloc.c.html. This looks like a workaround for this issue, but it is still possible to make free silently leak memory.
  • jemalloc had trouble with this and tried to avoid using mmap/munmap because of this issue (I don't know how it ended for them).

Do other OS's really guarantee deallocation of memory mappings? I know Windows does this, but what about other Unix-like operating systems? FreeBSD? QNX?


EDIT: I am adding example that shows how glibc's free can leak memory when internal munmap call fails with ENOMEM. Use strace to see that munmap fails:

#include <stdio.h>
#include <stdlib.h>
#include <errno.h>

#include <sys/mman.h>

// value of vm.max_map_count
#define VM_MAX_MAP_COUNT        (65530)

#define VMA_MMAP_SIZE           (4096)
#define VMA_MMAP_COUNT          (VM_MAX_MAP_COUNT)

// glibc's malloc default mmap_threshold is 128 KiB
#define VMA_MALLOC_SIZE         (128 * 1024)
#define VMA_MALLOC_COUNT        (VM_MAX_MAP_COUNT)

int main(void)
{
    static void *mmap_vma[VMA_MMAP_COUNT];

    for (int i = 0; i < VMA_MMAP_COUNT; i++) {
        mmap_vma[i] = mmap(0, VMA_MMAP_SIZE, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0);

        if (mmap_vma[i] == MAP_FAILED) {
            printf("mmap() failed at %d\n", i);
            return 1;
        }
    }

    for (int i = 0; i < VMA_MMAP_COUNT; i += 2) {
        if (munmap(mmap_vma[i], VMA_MMAP_SIZE) != 0) {
            printf("munmap() failed at %d (%p): %m\n", i, mmap_vma[i]);
            return 1;
        }
    }

    static void *malloc_vma[VMA_MALLOC_COUNT];

    for (int i = 0; i < VMA_MALLOC_COUNT; i++) {
        malloc_vma[i] = malloc(VMA_MALLOC_SIZE);

        if (malloc_vma[i] == NULL) {
            printf("malloc() failed at %d\n", i);
            return 1;
        }
    }

    for (int i = 0; i < VMA_MALLOC_COUNT; i += 2) {
        free(malloc_vma[i]);
    }
}

回答1:


One way to work around this problem on Linux is to mmap more that 1 page at once (e.g. 1 MB at a time), and also map a separator page after it. So, you actually call mmap on 257 pages of memory, then remap the last page with PROT_NONE, so that it cannot be accessed. This should defeat the VMA merging optimization in the kernel. Since you are allocating many pages at once, you should not run into the max mapping limit. The downside is you have to manually manage how you want to slice the large mmap.

As to your questions:

  1. System calls can fail on any system for a variety of reasons. Documentation is not always complete.

  2. You are allowed to munmap a part of a mmapd region as long as the address passed in lies on a page boundary, and the length argument is rounded up to the next multiple of the page size.



来源:https://stackoverflow.com/questions/43743555/munmap-failure-with-enomem-with-private-anonymous-mapping

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!