mremap(2) with HugeTLB to change virtual address?

匿名 (未验证) 提交于 2019-12-03 08:59:04

问题:

Is the Linux mremap(2) function able to change the virtual address of a HugeTLB obtained from mmap() to a new fixed virtual address?

(Background: I want to remap the virtual address based on the physical address of the memory I get. This is to efficiently perform virtual to physical address translations by inspecting pointer addresses directly. I will use the memory for DMA to hardware from userspace.)

This does not seem to work with my simple test program:

#define _GNU_SOURCE #include <stdio.h> #include <sys/mman.h> #include <stdint.h>  #define LARGE_PAGE_SIZE (1024*1024*1024)  int main() {   void *p1;   void *p2;   p1 = mmap(NULL, LARGE_PAGE_SIZE, PROT_READ|PROT_WRITE,     MAP_SHARED|MAP_ANONYMOUS|MAP_HUGETLB|MAP_LOCKED,     0, 0);   if (p1 == MAP_FAILED) { perror("mmap"); return 1;   }   printf("p1 = %p\n", p1);   p2 = mremap(p1, LARGE_PAGE_SIZE, LARGE_PAGE_SIZE,       MREMAP_MAYMOVE|MREMAP_FIXED,       (void*)(((uint64_t)p1) | 0x500000000000ULL));   if (p2 == MAP_FAILED) { perror("mremap"); return 1;   }   printf("p2 = %p\n", p2); } 

The mmap() succeeds by the mremap() fails:

$ gcc -o mremap_hugetlb mremap_hugetlb.c && sudo ./mremap_hugetlb p1 = 0x2aaac0000000 mremap: Invalid argument 

Note that the new address is calculated from the one obtained by the original mmap(). This is significant. The desired address is not known ahead of time and so I can't simply pass MAP_FIXED to mmap().

The workaround I currently use is to make the mmap() file-backed so that I can then mmap() it again at a fixed address, and munmap() the old mapping. This is suboptimal because it requires me to find a mounted hugetlbfs filesystem and I don't like the complexity of that dependency.

Current code based on the workaround: https://github.com/lukego/snabbswitch/blob/straightline/src/core/memory.c#L56

回答1:

Right now it looks like you do have to use hugetlbfs.

Unless I'm mistaken, the problem occurs in the Linux kernel because mm/mremap.c:mremap_to() calls mm/mremap.c:vma_to_resize(), which fails with EINVAL for huge pages.

Perhaps the test is incorrect, or the function lacks code to handle huge pages correctly. I'm wondering if one should contact the linux-kernel and linux-mm mailing lists, to see if this is a bug that should/could be easily fixed. However, that won't help you with users relying on current (and older) kernels.

Remember that when using mmap() on a file descriptor, you usually use a different code path as each file system can specify their own mmap handler. For hugetlbfs, the code is in fs/hugetlbfs/inode.c:hugetlbfs_file_mmap(). And, like you said, that code path seems to work okay for you.

Note that it is best if you let the user configure the hugetlbfs mount point, instead of scanning one from /proc/mounts, as that way the sysadmin can configure multiple hugetlbfs mount points, each with different configuration, for each service running on the server. (I'm hoping your service does not require running as root.)



回答2:

I have found a solution that seems better: POSIX shared memory (shm).

The shm API is able to allocate HugeTLB pages and map them multiple times even when no hugetlbfs filesystem is available. I allocate the HugeTLB with shmget and can then map it any number of times with shmat.



标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!