x86: Are memory barriers needed here?

别等时光非礼了梦想. 提交于 2019-12-13 17:37:42

问题


In WB-memory, a = b = 0

P1:
a = 1
SFENCE
b = 1

P2:
WHILE (b == 0) {}
LFENCE
ASSERT (a == 0)

It is my understanding, that neither the SFENCE or LFENCE are needed here.

Namely, since, for this memory type, x86 ensures:

  1. Reads cant be reordered with older reads
  2. Stores cant be reordered with older stores
  3. Stores are transitively visible

回答1:


The lfence and sfence asm instructions are no-ops unless you're using NT stores (or NT loads from WC memory, e.g. video RAM). (Actually, movntdqa loads might only be ordered by mfence on paper, not lfence. In which case I don't know when you'd ever use lfence. It was added to the ISA along with sfence + mfence at the same time as NT stores, before movntdqa, possibly just for completeness / in case it was ever needed.)

There is sometimes confusion around this point, because the C/C++ intrinsics for lfence and sfence are also compiler barriers. That is needed in C/C++, but can be had more cheaply with GNU C asm("":::"memory"); or (to order relaxed-atomic operations1) std::atomic_signal_fence(std::memory_order_acq_rel). Restricts compile-time reordering without making the compiler emit any useless asm barrier instructions.


Run-time reordering is already blocked by the x86 memory model, except for StoreLoad reordering which requires mfence to block. lfence + sfence don't add up to mfence. See Does it make any sense instruction LFENCE in processors x86/x86_64? and various other SO Q&As about these instructions.

This is why std::atomic_thread_fence(std::memory_order_acq_rel) also compiles to zero instructions on x86, but to barriers on weakly-ordered architectures.


lfence is also a serializing instruction on Intel microarchitectures (but maybe not AMD?). It has been all along, but Intel recently made this guarantee official so Spectre mitigation techniques could safely use it instead of a much more inconvenient cpuid.


  • Footnote 1:

atomic_signal_fence on gcc may also be a compiler barrier for plain non-atomic variables; it was last time I checked with gcc (while atomic_thread_fence wasn't), but this is probably just an implementation detail when there aren't any atomic variables involved. When there are atomic variables, the compiler knows that those variables may provide ordering that lets other threads access non-atomic variables without UB, so ordering is needed.



来源:https://stackoverflow.com/questions/49957408/x86-are-memory-barriers-needed-here

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!