In order to mitigate against kernel or cross-process memory disclosure (the Spectre attack), the Linux kernel1 will be compiled with a new option, -mindirect-branch=th
This question was asked a while ago, and deserves a newer answer.
Executive Summary:
“Retpoline” sequences are a software construct which allow indirect branches to be isolated from speculative execution. This may be applied to protect sensitive binaries (such as operating system or hypervisor implementations) from branch target injection attacks against their indirect branches.
The word "retpoline" is a portmanteau of the words "return" and "trampoline", much like the improvement "relpoline" was coined from "relative call" and "trampoline". It is a trampoline construct constructed using return operations which also figuratively ensures that any associated speculative execution will “bounce” endlessly.
In order to mitigate against kernel or cross-process memory disclosure (the Spectre attack), the Linux kernel [1] will be compiled with a new option,
-mindirect-branch=thunk-externintroduced to gcc to perform indirect calls through a so-called retpoline.[1] It's not Linux specific, however - similar or identical construct seems to be used as part of the mitigation strategies on other OSes.
The use of this compiler option only protects against Spectre V2 in affected processors that have the microcode update required for CVE-2017-5715. It will 'work' on any code (not just a kernel), but only code containing "secrets" is worth attacking.
This appears to be a newly invented term as a Google search turns up only very recent use (generally all in 2018).
The LLVM compiler has had a -mretpoline switch since before Jan 4 2018. That date is when the vulnerability was first publically reported. GCC made their patches available Jan 7, 2018.
The CVE date suggests that the vulnerability was 'discovered' in 2017, but it affects some of the processors manufactured in the past two decades (thus it was likely discovered long ago).
What is a retpoline and how does it prevent the recent kernel information disclosure attacks?
First, a few definitions:
Trampoline - Sometimes referred to as indirect jump vectors trampolines are memory locations holding addresses pointing to interrupt service routines, I/O routines, etc. Execution jumps into the trampoline and then immediately jumps out, or bounces, hence the term trampoline. GCC has traditionally supported nested functions by creating an executable trampoline at run time when the address of a nested function is taken. This is a small piece of code which normally resides on the stack, in the stack frame of the containing function. The trampoline loads the static chain register and then jumps to the real address of the nested function.
Thunk - A thunk is a subroutine used to inject an additional calculation into another subroutine. Thunks are primarily used to delay a calculation until its result is needed, or to insert operations at the beginning or end of the other subroutine
Memoization - A memoized function "remembers" the results corresponding to some set of specific inputs. Subsequent calls with remembered inputs return the remembered result rather than recalculating it, thus eliminating the primary cost of a call with given parameters from all but the first call made to the function with those parameters.
Very roughly, a retpoline is a trampoline with a return as a thunk, to 'spoil' memoization in the indirect branch predictor.
Source: The retpoline includes a PAUSE instruction for Intel, but an LFENCE instruction is necessary for AMD since on that processor the PAUSE instruction is not a serializing instruction, so the pause/jmp loop will use excess power as it is speculated over waiting for return to mispredict to the correct target.
Arstechnica has a simple explanation of the problem:
"Each processor has an architectural behavior (the documented behavior that describes how the instructions work and that programmers depend on to write their programs) and a microarchitectural behavior (the way an actual implementation of the architecture behaves). These can diverge in subtle ways. For example, architecturally, a program that loads a value from a particular address in memory will wait until the address is known before trying to perform the load. Microarchitecturally, however, the processor might try to speculatively guess at the address so that it can start loading the value from memory (which is slow) even before it's absolutely certain of which address it should use.
If the processor guesses wrong, it will ignore the guessed-at value and perform the load again, this time with the correct address. The architecturally defined behavior is thus preserved. But that faulty guess will disturb other parts of the processor—in particular the contents of the cache. These microarchitectural disturbances can be detected and measured by timing how long it takes to access data that should (or shouldn't) be in the cache, allowing a malicious program to make inferences about the values stored in memory.".
From Intel's paper: "Retpoline: A Branch Target Injection Mitigation" (.PDF):
"A retpoline sequence prevents the processor’s speculative execution from using the "indirect branch predictor" (one way of predicting program flow) to speculate to an address controlled by an exploit (satisfying element 4 of the five elements of branch target injection (Spectre variant 2) exploit composition listed above).".
Note, element 4 is: "The exploit must successfully influence this indirect branch to speculatively mispredict and execute a gadget. This gadget, chosen by the exploit, leaks the secret data via a side channel, typically by cache-timing.".