Run time overhead of compiler barrier in gcc for x86 processors

问题

I was looking into the side effects/run time overhead of using compiler barrier ( in gcc ) in x86 env.

Compiler barrier: asm volatile( ::: "memory" )

GCC documentation tells something interesting ( https://gcc.gnu.org/onlinedocs/gcc/Extended-Asm.html )

Excerpt:

The "memory" clobber tells the compiler that the assembly code performs memory reads or writes to items other than those listed in the input and output operands (for example, accessing the memory pointed to by one of the input parameters). To ensure memory contains correct values, GCC may need to flush specific register values to memory before executing the asm. Further, the compiler does not assume that any values read from memory before an asm remain unchanged after that asm; it reloads them as needed. Using the "memory" clobber effectively forms a read/write memory barrier for the compiler.

Question:

1) What register values are flushed ?

2) Why it needs to be flushed ?

3) Example ?

4) Is there any other overhead apart from register flushing ?

回答1:

Every memory location which another thread might have a pointer to needs to be up to date before the barrier, and reloaded after. So any such values that are live in registers needed to be stored (if dirty), or just "forgotten about" if the value in a register is just a copy of what's still in memory.

See this gcc non-bug report for this quote from a gcc dev: a "memory" clobber only includes memory that can be indirectly accessed (thus may be address-taken in this or another compilation unit)

Is there any other overhead apart from register flushing ?

A barrier can prevent optimizations like sinking a store out of a loop, but that's usually why you used barriers. Make sure your loop counters and loop variables are locals that haven't had their address passed to functions the compiler can't see, or else they'll have to be spilled/reloaded inside the loop. Letting references escape your function is always a potential problem for optimization, but it's a near-guarantee of worse code with barriers.

Why?

This is the whole point of a barrier: so values are synced to memory, preventing compile-time reordering.

asm volatile( ::: "memory" ) is (exactly?) equivalent to atomic_signal_fence(memory_order_seq_cst) (not atomic_thread_fence, which would take an mfence instruction to implement on x86).

Examples:

See Jeff Preshing's Memory Ordering at Compile Time article for more about why, and examples with actual x86 asm.

来源：https://stackoverflow.com/questions/38884893/run-time-overhead-of-compiler-barrier-in-gcc-for-x86-processors

标签

gcc

memory

assembly

linux-kernel

inline-assembly