What happens with nested branches and speculative execution?

假如想象 提交于 2021-01-24 06:59:08

问题


Alright, so I know that if a particular conditional branch has a condition that takes time to compute (memory access, for instance), the CPU assumes a condition result and speculatively executes along that path. However, what would happen if, along that path, yet another slow conditional branch pops up (assuming, of course, that the first condition hasn't been resolved yet and the CPU can't just commit the changes)? Does the CPU just speculate inside the speculation? What happens if the last condition is mispredicted but the first wasn't? Does it just rollback all the way?

I'm talking about something like this:

if (value_in_memory == y){
   // computations
   if (another_val_memory == x){
      //computations
   }
}

回答1:


Speculative execution is the regular state of execution, not a special mode that an out of order CPU enters when it sees a branch and then leaves when the branch is no longer in flight.

This is easier to see if you consider that it's not just branches that can fault, but many instructions, including those that access memory, have restrictions on their input values, etc. So any substantial out of order execution implies constant speculation, and CPUs are built around that idea.

So "nested branches" doesn't end up being special in that sense.

Now, modern CPUs have a variety of methods for quick branch misprediction recovery, faster than recovery from other types of faults1. For example they may snapshot the state of the register mapping at some branches, to allow recovery to start before the branch is at the head of the reorder buffer. Since it is not always feasible to snapshot at all branches, there might be complicated heuristics involved to decide where to take snapshots.

I mention this last part because it is one way in which nested branches might matter: when there are lots of branches in flight, you might hit some microarchitectural limits related to the tracking of these branches for recovery purposes. For more details, you can look through patents for "branch order buffer" (for Intel techniques, but there are no doubt others).


1 The basic recovery method is keep executing until the faulting instruction is the next to retire, and then throw away all younger instructions. In the context of branch mispredictions, this means you could actually suffer two or more mispredictions only the oldest of which actually takes effect: e.g., a younger branch mispredicts, and while executing up to that branch (at which point recovery can occur), another mispredict occurs, so the younger one ends up getting discarded.




回答2:


(Maybe not a complete answer, but I had some of this written when @BeeOnRope posted an answer. Posting this anyway for some more links and technical details in case anyone's curious.)


Everything is always speculative until it reaches retirement and becomes non-speculative, definitely happened, part of the architectural state.

e.g. any load might fault with a bad address, any div might trap on divide by zero. See also Out-of-order execution vs. speculative execution That and What exactly happens when a skylake CPU mispredicts a branch? mention that branch mispredicts are handled specially, because they're expected to be frequent. Fast-recovery can start before a mis-predicted branch reaches retirement, unlike the behaviour for a faulting load for example. (That's part of why Meltdown is exploitable.)

So even "regular" instructions are executed speculatively before being commited, and the only distinction between them is a human-made distinction, not computer-made? I presume, then, that the CPU stores multiple, possible rollback points? For instance if I have load instructions that may lead to page faults or simply use stale values, inside a conditional branch, the CPU identifies such instructions and scenarios and saves a state for each of them? I feel like I misunderstood because this may lead to a lot of storing register states and complicated dependencies.

The retirement state is always consistent so you can always roll back to there and discard all in-flight work, e.g. if an external interrupt arrives you want to handle it without waiting for a chain of a dozen cache miss loads to all execute. When an interrupt occurs, what happens to instructions in the pipeline?

This tracking basically happens for free or is something you need to do anyway to be able to detect which instruction faulted, not just that there was a problem somewhere. (This is called "precise exceptions")

The real distinction humans can usefully make is speculation that has a real chance of being wrong during execution of non-error cases. If your code gets a bad pointer, it doesn't really matter how it performs; it's going to page-fault and that's going to be very slow compared to local OoO exec details.


You're talking about a modern out-of-order (OoO) execution (not just fetch) CPU, like modern Intel or AMD x86, high-end ARM, MIPS r10000, etc.

The front-end is in-order (with speculation down predicted paths), and so is commit (aka retirement) from the out-of-order back-end into non-speculative retirement state. (aka known-good architectural state).

The CPU uses two major structures to track instructions (or on x86, uops = parts of instructions) in the back-end. The last stage of the front-end (after fetch / decode) allocates/renames instructions and adds them into both of these structures at once.

  • RS = Reservation Station = scheduler: not-yet-executed instructions, waiting for an execution unit. The RS tracks dependencies and sends the oldest-ready uops to execution units that are ready.
  • ROB = ReOrder Buffer: not-yet-retired instructions. Instructions enter and leave in-order so it can just be a circular buffer.

    Includes a flag to mark each entry as executed or not, set once the RS has sent it to an execution unit which reports success. The oldest instructions in the ROB that all have their done-executing bit set can "retire".

    Also includes a flag which indicates "fault if this reaches retirement". This avoids spending time handling page faults from load instruction on the wrong path of execution (that might well have pointers into an unmapped page), for example. Either in the shadow of a branch mispredict, or just after another instruction (in program order) that should have faulted first but OoO exec got to it later.

(I'm also leaving out register-renaming onto a large physical register file. That's the "rename" part. Allocate includes choosing which execution port an instruction will use, and reserving a load or store buffer entry for memory instructions.)

(There's also a store-buffer; stores don't write directly to L1d cache, they write to the store buffer. This makes it possible to speculatively execute stores and still roll back without them becoming visible to other cores. It also decouples cache-miss stores from execution. Once a store instruction retires, the store-buffer entry "graduates" and is eligible to commit to L1d cache, once MESI gets exclusive access to the cache line, and once memory-ordering rules are satisfied.)


Execution units detect whether an instruction should fault, or was mis-speculated and should roll back, but don't necessarily act on that until the instruction reaches retirement.

In-order retirement is the step that recovers program-order after OoO exec, including the case of exceptions of mis-speculation.


Terminology: Intel calls it "issue" when instructions are sent from the front-end into the ROB + RS. Other computer architecture people often call that "dispatch".

Sending uops from the RS to execution units is called "dispatch" by Intel, "issue" by other people.



来源:https://stackoverflow.com/questions/59209662/what-happens-with-nested-branches-and-speculative-execution

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!