Which is a better write barrier on x86: lock+addl or xchgl?

前端 未结 5 1471
我在风中等你
我在风中等你 2020-11-29 03:14

The Linux kernel uses lock; addl $0,0(%%esp) as write barrier, while the RE2 library uses xchgl (%0),%0 as write barrier. What\'s the difference an

5条回答
  •  迷失自我
    2020-11-29 03:35

    The "lock; addl $0,0(%%esp)" is faster in case that we testing the 0 state of lock variable at (%%esp) address. Because we add 0 value to lock variable and the zero flag is set to 1 if the lock value of variable at address (%%esp) is 0.


    lfence from Intel datasheet:

    Performs a serializing operation on all load-from-memory instructions that were issued prior the LFENCE instruction. This serializing operation guarantees that every load instruction that precedes in program order the LFENCE instruction is globally visible before any load instruction that follows the LFENCE instruction is globally visible.

    (Editor's note: mfence or a locked operation is the only useful fence (after a store) for sequential consistency. lfence does not block StoreLoad reordering by the store buffer.)


    For instance: memory write instruction like 'mov' are atomic (they don't need lock prefix) if there are properly aligned. But this instruction is normally executed in CPU cache and will not be globally visible at this moment for all other threads, because memory fence must be performed first to make this thread wait until previous stores are visible to other threads.


    So the main difference between these two instructions is that xchgl instruction will not have any effect on the conditional flags. Certainly we can test the lock variable state with lock cmpxchg instruction but this is still more complex than with lock add $0 instruction.

提交回复
热议问题