gcc inline assembly using modifier “P” and constraint “p” over “m” in Linux kernel

前端未结

关注

 1  1312

你的背包

I\'m reading Linux kernel source code (3.12.5 x86_64) to understand how process descriptor is handled.

I found to get current process descriptor I could use current_

相关标签:

1条回答

旧巷少年郎

2020-12-17 19:48

The reason why your example code doesn't work is because the "p" constraint is only of a very limited use in inline assembly. All inline assembly operands have the requirement that they be representable as an operand in assembly language. If the operand isn't representable than compiler makes it so by moving it to a register first and substituting that as the operand. The "p" constraint places an additional restriction: the operand must be a valid address. The problem is that a register isn't a valid address. A register can contain an address but a register is not itself an valid address.

That means the operand of the "p" constraint must be have a valid assembly representation as is and be a valid address. You're trying to use the address of a variable on the stack as the operand. While this is a valid address, it's not a valid operand. The stack variable itself has a valid representation (something like 8(%rbp)), but the address of the stack variable doesn't. (If it were representable it would be something like 8 + %rbp, but this isn't a legal operand.)

One of the few things that you can take the address of and use as an operand with the "p" constraint is a statically allocated variable. In this case it's a valid assembly operand, as it can be represented as an immediate value (eg. &kernel_stack can be represented as $kernel_stack). It's also a valid address and so satisfies the constraint.

So that's why Linux kernel macro works and you macro doesn't. You're trying to use it with stack variables, while the kernel only uses it with statically allocated variables.

Or at least what looks like a statically allocated variabvle to the compiler. In fact kernel_stack is actually allocated in a special section used for per CPU data. This section doesn't actually exist, instead it's used as a template to create a separate region of memory for each CPU. The offset of kernel_stack in this special section is used as the offset in each per CPU data region to store a separate kernel stack value for each CPU. The FS or GS segment register is used as the base of this region, each CPU using a different address as the base.

So that's why the Linux kernel use inline assembly to access what otherwise looks like a static variable. The macro is used to turn the static variable into a per CPU variable. If you're not trying to do something like this then you probably don't have anything to gain by copying from the kernel macro. You should probably be considering a different way to do what you're trying accomplish.

Now if you're thinking since Linus Torvalds has come with this optimization in the kernel to replace an "m" constraint with a "p" it must be a good idea to do this generally, you should be very aware how fragile this optimization is. What its trying to do is fool GCC into thinking that reference to kernel_stack doesn't actually access memory, so that it won't keep reloading the value every time it changes memory. The danger here is that if kernel_stack does change then the compiler will be fooled, and continue to use the old value. Linus knows when and how the per CPU variables are changed, and so can be confident that the macro is safe when used for its intended purpose in the kernel.

If you want eliminate redundant loads in your own code, I suggest using -fstrict-aliasing and/or the restrict keyword. That way you're not dependant on a fragile and non-portable inline assembly macros.

0 讨论(0)
发布评论:

提交评论
- 加载中...