What happens for a RIP-relative load next to the current instruction? Cache hit?

问题

I am reading Agner Fog's book on x86 assembly. I am wondering about how RIP-relative addressing works in this scenario. Specifically, assume my RIP offset is +1. This suggests the data I want to read is right next to this instruction in memory.

This piece of data is likely already fetched into the L1 instruction cache. Assuming that this data is not also in the L1d, what exactly will happen on the CPU?

Let's assume it's a relatively recent Intel architecture like Kaby Lake.

回答1:

Yes, it's probably hot in L1i cache, as well as the uop cache. The page is also hot in L1iTLB. But all that's irrelevant for a data load.

It might be hot in L2 because of instruction fetch, but it might have been evicted since then (L2 is NINE wrt. L1 caches). So best case is a hit in L2.

L1iTLB and L1dTLB are separate, so it will miss in L1dTLB if this is the first data load from that page. If the unified 2nd-level TLB is a victim cache, it could miss there and even trigger a page walk despite being hot in L1iTLB, but I don't know if L2TLB actually is a victim cache or not in recent Intel CPUs. It would make sense, though; code and data in the same page are usually rare. (Although less rare than code and data in the same line.)

See also Why do Compilers put data inside .text(code) section of the PE and ELF files and how does the CPU distinguish between data and code? for some details and discussion. But note that's a false claim, compilers don't do that on x86 because it's the opposite of helpful for performance (wasting TLB coverage footprint, and wasting cache capacity), unlike on ARM where constant pools between functions are normal because PC-relative addressing has very limited range. Only some obfuscators might do it.

Specifically, assume my RIP offset is +1. This suggests the data I want to read is right next to this instruction in memory

The rel32 is relative to the end of the current instruction. So no, not right next to; that would be a 1-byte gap.

e.g. like this:

              movzx eax, byte [rip + 1]  
              ret
                            ; could be a page boundary here
load_target:  int3        ; db 0xcc

Note that [RIP+1] could be in a different cache line or even page than the instruction using that addressing mode, if the instruction ends within 0 or 1 byte of a page boundary.

That 1 byte could even be a ret, so it's possible that this instruction could already be executing without the front-end having already (or ever) fetched from that other line or page, like it would have otherwise. I think you were more interested in the case where you're fetching from the same line that contains the current instruction though. Might as well say mov eax, [RIP - 4] to fetch the -4 rel32 itself from the current instruction's machine code.

Loads don't trigger self-modifying-code pipeline nukes, only stores, so that's fine.

来源：https://stackoverflow.com/questions/62637943/what-happens-for-a-rip-relative-load-next-to-the-current-instruction-cache-hit

标签

assembly

x86

x86-64

cpu-architecture

cpu-cache