Address canonical form and pointer arithmetic

前端 未结 1 1158
Happy的楠姐
Happy的楠姐 2020-11-28 12:56

On AMD64 compliant architectures, addresses need to be in canonical form before being dereferenced.

From the Intel manual, section 3.3.7.1:

In

相关标签:
1条回答
  • 2020-11-28 13:55

    The canonical address rules mean there is a giant hole in the 64-bit virtual address space. 2^47-1 is not contiguous with the next valid address above it, so a single mmap won't include any of the unusable range of 64-bit addresses.

    +----------+
    | 2^64-1   |   0xffffffffffffffff
    | ...      |
    | 2^64-2^47|   0xffff800000000000
    +----------+
    |          |
    | unusable |      not to scale: this part is 2^16 times as large
    |          |
    +----------+
    | 2^47-1   |   0x00007fffffffffff
    | ...      |
    | 0        |   0x0000000000000000
    +----------+
    

    Also most kernels reserve the high half of the canonical range for their own use. e.g. x86-64 Linux's memory map. User-space can only allocate in the contiguous low range anyway so the existence of the gap is irrelevant.

    Is there a guarantee by the OS that you will never be allocated memory whose address range does not vary by the 47th bit?

    Not exactly. The 48-bit address space supported by current hardware is an implementation detail. The canonical-address rules ensure that future systems can support more virtual address bits without breaking backwards compatibility to any significant degree.

    At most, you'd just need a compat flag to have the OS not give the process any memory regions with high bits not all the same. (Like Linux's current MAP_32BIT flag for mmap, or a process-wide setting). That could support programs that used the high bits for tags and manually redid sign-extension.

    Future hardware won't need to support any kind of flag to ignore high address bits or not, because junk in the high bits is currently an error. Intel 5-level paging adds another 9 virtual address bits, widening the canonical high andd low halves. white paper.

    See also Why in 64bit the virtual address are 4 bits short (48bit long) compared with the physical address (52 bit long)?


    Fun fact: Linux defaults to mapping the stack at the top of the lower range of valid addresses. (Related: Why does Linux favor 0x7f mappings?)

    $ gdb /bin/ls
    ...
    (gdb) b _start
    Function "_start" not defined.
    Make breakpoint pending on future shared library load? (y or [n]) y
    Breakpoint 1 (_start) pending.
    (gdb) r
    Starting program: /bin/ls
    
    Breakpoint 1, 0x00007ffff7dd9cd0 in _start () from /lib64/ld-linux-x86-64.so.2
    (gdb) p $rsp
    $1 = (void *) 0x7fffffffd850
    (gdb) exit
    
    $ calc
    2^47-1
                  0x7fffffffffff
    

    (Modern GDB can use starti to break before the first user-space instruction executes instead of messing around with breakpoint commands.)

    0 讨论(0)
提交回复
热议问题