Which is a better write barrier on x86: lock+addl or xchgl?

前端未结

关注

 5  1480

我在风中等你 2020-11-29 03:14

The Linux kernel uses lock; addl $0,0(%%esp) as write barrier, while the RE2 library uses xchgl (%0),%0 as write barrier. What\'s the difference an

5条回答

暗喜 (楼主)

2020-11-29 03:37
Quoting from the IA32 manuals (Vol 3A, Chapter 8.2: Memory Ordering):
In a single-processor system for memory regions defined as write-back cacheable, the memory-ordering model respects the following principles [..]
- Reads are not reordered with other reads
- Writes are not reordered with older reads
- Writes to memory are not reordered with other writes, with the exception of
  - writes executed with the CLFLUSH instruction
  - streaming stores (writes) executed with the non-temporal move instructions ([list of instructions here])
  - string operations (see Section 8.2.4.1)
- Reads may be reordered with older writes to different locations but not with older writes to the same location.
- Reads or writes cannot be reordered with I/O instructions, locked instructions, or serializing instructions
- Reads cannot pass LFENCE and MFENCE instructions
- Writes cannot pass SFENCE and MFENCE instructions
Note: The "In a single-processor system" above is slightly misleading. The same rules hold for each (logical) processor individually; the manual then goes on to describe the additional ordering rules between multiple processors. The only bit about it pertaining to the question is that
- Locked instructions have a total order.
In short, as long as you're writing to write-back memory (which is all memory you'll ever see as long as you're not a driver or graphics programmer), most x86 instructions are almost sequentially consistent - the only reordering an x86 CPU can perform is reorder later (independent) reads to execute before writes. The main thing about the write barriers is that they have a lock prefix (implicit or explicit), which forbids all reordering and ensures that the operations is seen in the same order by all processors in a multi-processor system.

Also, in write-back memory, reads are never reordered, so there's no need for read barriers. Recent x86 processors have a weaker memory consistency model for streaming stores and write-combined memory (commonly used for mapped graphics memory). That's where the various fence instructions come into play; they're not necessary for any other memory type, but some drivers in the Linux kernel do deal with write-combined memory so they just defined their read-barrier that way. The list of ordering model per memory type is in Section 11.3.1 in Vol. 3A of the IA-32 manuals. Short version: Write-Through, Write-Back and Write-Protected allow speculative reads (following the rules as detailed above), Uncachable and Strong Uncacheable memory has strong ordering guarantees (no processor reordering, reads/writes are immediately executed, used for MMIO) and Write Combined memory has weak ordering (i.e. relaxed ordering rules that need fences).
0 讨论(0)

查看其它5个回答
发布评论:

提交评论
- 加载中...