Bring code into the L1 instruction cache without executing it

后端 未结 3 1808
囚心锁ツ
囚心锁ツ 2021-01-16 10:09

Let\'s say I have a function that I plan to execute as part of a benchmark. I want to bring this code into the L1 instruction cache prior to executing since I don\'t want to

3条回答
  •  暗喜
    暗喜 (楼主)
    2021-01-16 10:21

    Map the same physical page to two different virtual addresses.

    L1I$ is physically addressed. (VIPT but with all the index bits from below the page offset, so effectively PIPT).

    Branch-prediction and uop caches are virtually addressed, so with the right choice of virtual addresses, a warm-up run of the function at the alternate virtual address will prime L1I, but not branch prediction or uop caches. (This only works if branch aliasing happens modulo something larger than 4096 bytes, because the position within the page is the same for both mappings.)

    Prime the iTLB by calling to a ret in the same page as the test function, but outside it.


    After setting this up, no modification of the page tables are required between the warm-up run and the timing run. This is why you use two mappings of the same page instead of remapping a single mapping.

    Margaret Bloom suggests that CPUs vulnerable to Meltdown might speculatively fetch instructions from a no-exec page if you jump there (in the shadow of a mispredict so it doesn't actually fault), but that would then require changing the page table, and thus a system call which is expensive and might evict that line of L1I. But if it doesn't pollute the iTLB, you could then re-populate the iTLB entry with a mispredicted branch anywhere into the same page as the function. Or just a call to a dummy ret outside the function in the same page.

    None of this will let you get the uop cache warmed up, though, because it's virtually addressed. OTOH, in real life, if branch predictors are cold then probably the uop cache will also be cold.

提交回复
热议问题