Why can't kernel code use a Red Zone

问题

It is highly recommended when creating a 64-bit kernel (for x86_64 platform), to instruct the compiler not to use the 128-byte Red Zone that the user-space ABI does. (For GCC the compiler flag is -mno-red-zone).

The kernel would not be interrupt-safe if it is enabled.

But why is that?

回答1:

Quoting from the AMD64 ABI:

The 128-byte area beyond the location pointed to by %rsp is considered to be reserved and shall not be modified by signal or interrupt handlers. Therefore, functions may use this area for temporary data that is not needed across function calls. In particular, leaf functions may use this area for their entire stack frame, rather than adjusting the stack pointer in the prologue and epilogue. This area is known as the red zone.

Essentially, it's an optimization - the userland compiler knows exactly how much of the Red Zone is used at any given time (in the simplest implementation, the entire size of local variables) and can adjust the %rsp accordingly before calling a sub-function.

Especially in leaf functions, this can yield some performance benefits of not having to adjust %rsp as we can be certain no unfamiliar code would run while in the function. (POSIX Signal Handlers might be seen as a form of a co-routine, but you can instruct the compiler to adjust the registers before using stack variables in a signal handler).

In the kernel space, once you start thinking about interrupts, if those interrupts make any assumptions about %rsp, they will likely be incorrect - there is no certainty with regards to the utilization of the Red Zone. So, you either assume all of it is dirty, and needlessly waste stack space (effectively running with a 128-byte guaranteed local variable in every function), or, you guarantee that the interrupts make no assumptions about %rsp - which is tricky.

In user space, context switches + 128-byte overallocation of stack handle it for you.

回答2:

In kernel-space, you're using the same stack that interrupts use. When an interrupt happens, the CPU pushes a return address and RFLAGS. This clobbers 16 bytes below rsp. Even if you wanted to write an interrupt-handler that assumed the full 128 bytes of the red-zone were valuable, it would be impossible.

You could maybe have a kernel-internal ABI that had a small red-zone from rsp-16 to rsp-48 or something. (Small because kernel stack is valuable, and most functions don't need very much red-zone anyway.)

Interrupt handlers would have to sub rsp, 32 before pushing any registers. (and restore it before iret).

This idea won't work if an interrupt handler can itself be interrupted before it runs sub rsp, 32, or after it restores rsp before an iret. There would be a window of vulnerability where valuable data is at rsp .. rsp-16.

Another practical problem with this scheme is that AFAIK gcc doesn't have configurable red-zone parameters. It's either on or off. So you'd have to add support for a kernel flavour of red-zone to gcc / clang if you wanted to take advantage of it.

Even if it was safe from nested interrupts, the benefits are pretty small. The difficulty of proving it's safe in a kernel might make it not worth it. (And as I said, I'm not at all sure it can be implemented safely, because I think nested interrupts are possible.)

(BTW, see the x86 tag wiki for links to the ABI documenting the red-zone, and other stuff.)

回答3:

It is possible to use red-zone in kernel-type contexts. The IDTentry can specify a stack index (ist) of 0..7, where 0 is a bit special. The TSS contains a table of these stacks. 1..7 are loaded, and used for the initial registers saved by the exception/interrupt, and do not nest. If you partition the various exception entries by priorities (eg. NMI is the highest and can happen at any time) and treat these stacks as trampolines, you can safely handle red zones in kernel-type contexts. That is, you can subtract 128 from the saved stack pointer to get a usable kernel stack before enabling interrupts or code which can cause exceptions.

The zero index stack behaves in a more conventional manner, pushing the stack,flags,pc,error on the existing stack when there is no privilege transition.

The code in the trampoline has to be careful (duh, it is a kernel) not to generate other exceptions while it sanitizes the machine state, but provides a nice, safe spot to detect pathological kernel nesting, stack corruption, etc... [ sorry to respond so late, noticed this while searching for something else].

来源：https://stackoverflow.com/questions/25787408/why-cant-kernel-code-use-a-red-zone

标签

x86-64

abi

red-zone

Why can&#39;t kernel code use a Red Zone

问题

回答1:

回答2:

回答3:

Why can't kernel code use a Red Zone