Why can't I mmap(MAP_FIXED) the highest virtual page in a 32-bit Linux process on a 64-bit kernel?

后端 未结 1 788
离开以前
离开以前 2020-12-10 16:55

While attempting to test Is it allowed to access memory that spans the zero boundary in x86? in user-space on Linux, I wrote a 32-bit test program that tries to map the low

1条回答
  •  情歌与酒
    2020-12-10 17:29

    The mmap function eventually calls either do_mmap or do_brk_flags which do the actual work of satisfying the memory allocation request. These functions in turn call get_unmapped_area. It is in that function that the checks are made to ensure that memory cannot be allocated beyond the user address space limit, which is defined by TASK_SIZE. I quote from the code:

     * There are a few constraints that determine this:
     *
     * On Intel CPUs, if a SYSCALL instruction is at the highest canonical
     * address, then that syscall will enter the kernel with a
     * non-canonical return address, and SYSRET will explode dangerously.
     * We avoid this particular problem by preventing anything executable
     * from being mapped at the maximum canonical address.
     *
     * On AMD CPUs in the Ryzen family, there's a nasty bug in which the
     * CPUs malfunction if they execute code from the highest canonical page.
     * They'll speculate right off the end of the canonical space, and
     * bad things happen.  This is worked around in the same way as the
     * Intel problem.
    
    #define TASK_SIZE_MAX   ((1UL << __VIRTUAL_MASK_SHIFT) - PAGE_SIZE)
    
    #define IA32_PAGE_OFFSET    ((current->personality & ADDR_LIMIT_3GB) ? \
                        0xc0000000 : 0xFFFFe000)
    
    #define TASK_SIZE       (test_thread_flag(TIF_ADDR32) ? \
    IA32_PAGE_OFFSET : TASK_SIZE_MAX)
    

    On processors with 48-bit virtual address spaces, __VIRTUAL_MASK_SHIFT is 47.

    Note that TASK_SIZE is specified depending on whether the current process is 32-bit on 32-bit, 32-bit on 64-bit, 64-bit on 64-bit. For 32-bit processes, two pages are reserved; one for the vsyscall page and the other used as a guard page. Essentially, the vsyscall page cannot be unmapped and so the highest address of the user address space is effectively 0xFFFFe000. For 64-bit processes, one guard page is reserved. These pages are only reserved on 64-bit Intel and AMD processors because only on these processors the SYSCALL mechanism is used.

    Here is the check that is performed in get_unmapped_area:

    if (addr > TASK_SIZE - len)
         return -ENOMEM;
    

    0 讨论(0)
提交回复
热议问题