Which is a better write barrier on x86: lock+addl or xchgl?

前端 未结 5 1480
我在风中等你
我在风中等你 2020-11-29 03:14

The Linux kernel uses lock; addl $0,0(%%esp) as write barrier, while the RE2 library uses xchgl (%0),%0 as write barrier. What\'s the difference an

5条回答
  •  暗喜
    暗喜 (楼主)
    2020-11-29 03:37

    Quoting from the IA32 manuals (Vol 3A, Chapter 8.2: Memory Ordering):

    In a single-processor system for memory regions defined as write-back cacheable, the memory-ordering model respects the following principles [..]

    • Reads are not reordered with other reads
    • Writes are not reordered with older reads
    • Writes to memory are not reordered with other writes, with the exception of
      • writes executed with the CLFLUSH instruction
      • streaming stores (writes) executed with the non-temporal move instructions ([list of instructions here])
      • string operations (see Section 8.2.4.1)
    • Reads may be reordered with older writes to different locations but not with older writes to the same location.
    • Reads or writes cannot be reordered with I/O instructions, locked instructions, or serializing instructions
    • Reads cannot pass LFENCE and MFENCE instructions
    • Writes cannot pass SFENCE and MFENCE instructions

    Note: The "In a single-processor system" above is slightly misleading. The same rules hold for each (logical) processor individually; the manual then goes on to describe the additional ordering rules between multiple processors. The only bit about it pertaining to the question is that

    • Locked instructions have a total order.

    In short, as long as you're writing to write-back memory (which is all memory you'll ever see as long as you're not a driver or graphics programmer), most x86 instructions are almost sequentially consistent - the only reordering an x86 CPU can perform is reorder later (independent) reads to execute before writes. The main thing about the write barriers is that they have a lock prefix (implicit or explicit), which forbids all reordering and ensures that the operations is seen in the same order by all processors in a multi-processor system.

    Also, in write-back memory, reads are never reordered, so there's no need for read barriers. Recent x86 processors have a weaker memory consistency model for streaming stores and write-combined memory (commonly used for mapped graphics memory). That's where the various fence instructions come into play; they're not necessary for any other memory type, but some drivers in the Linux kernel do deal with write-combined memory so they just defined their read-barrier that way. The list of ordering model per memory type is in Section 11.3.1 in Vol. 3A of the IA-32 manuals. Short version: Write-Through, Write-Back and Write-Protected allow speculative reads (following the rules as detailed above), Uncachable and Strong Uncacheable memory has strong ordering guarantees (no processor reordering, reads/writes are immediately executed, used for MMIO) and Write Combined memory has weak ordering (i.e. relaxed ordering rules that need fences).

提交回复
热议问题