Address canonical form and pointer arithmetic

前端未结

关注

 1  1169

Happy的楠姐

On AMD64 compliant architectures, addresses need to be in canonical form before being dereferenced.

From the Intel manual, section 3.3.7.1:

In

相关标签:

1条回答

走了就别回头了

2020-11-28 13:55
The canonical address rules mean there is a giant hole in the 64-bit virtual address space. 2^47-1 is not contiguous with the next valid address above it, so a single mmap won't include any of the unusable range of 64-bit addresses.
```
+----------+
| 2^64-1   |   0xffffffffffffffff
| ...      |
| 2^64-2^47|   0xffff800000000000
+----------+
|          |
| unusable |      not to scale: this part is 2^16 times as large
|          |
+----------+
| 2^47-1   |   0x00007fffffffffff
| ...      |
| 0        |   0x0000000000000000
+----------+
```
Also most kernels reserve the high half of the canonical range for their own use. e.g. x86-64 Linux's memory map. User-space can only allocate in the contiguous low range anyway so the existence of the gap is irrelevant.

Is there a guarantee by the OS that you will never be allocated memory whose address range does not vary by the 47th bit?

Not exactly. The 48-bit address space supported by current hardware is an implementation detail. The canonical-address rules ensure that future systems can support more virtual address bits without breaking backwards compatibility to any significant degree.

At most, you'd just need a compat flag to have the OS not give the process any memory regions with high bits not all the same. (Like Linux's current MAP_32BIT flag for mmap, or a process-wide setting). That could support programs that used the high bits for tags and manually redid sign-extension.

Future hardware won't need to support any kind of flag to ignore high address bits or not, because junk in the high bits is currently an error. Intel 5-level paging adds another 9 virtual address bits, widening the canonical high andd low halves. white paper.

See also Why in 64bit the virtual address are 4 bits short (48bit long) compared with the physical address (52 bit long)?

Fun fact: Linux defaults to mapping the stack at the top of the lower range of valid addresses. (Related: Why does Linux favor 0x7f mappings?)
```
$ gdb /bin/ls
...
(gdb) b _start
Function "_start" not defined.
Make breakpoint pending on future shared library load? (y or [n]) y
Breakpoint 1 (_start) pending.
(gdb) r
Starting program: /bin/ls

Breakpoint 1, 0x00007ffff7dd9cd0 in _start () from /lib64/ld-linux-x86-64.so.2
(gdb) p $rsp
$1 = (void *) 0x7fffffffd850
(gdb) exit

$ calc
2^47-1
              0x7fffffffffff
```
(Modern GDB can use starti to break before the first user-space instruction executes instead of messing around with breakpoint commands.)
0 讨论(0)
发布评论:

提交评论
- 加载中...