how is a memory barrier in linux kernel is used

问题

There is an illustration in kernel source Documentation/memory-barriers.txt, like this:

    CPU 1                   CPU 2
    ======================= =======================
            { B = 7; X = 9; Y = 8; C = &Y }
    STORE A = 1
    STORE B = 2
    <write barrier>
    STORE C = &B            LOAD X
    STORE D = 4             LOAD C (gets &B)
                            LOAD *C (reads B)

Without intervention, CPU 2 may perceive the events on CPU 1 in some effectively random order, despite the write barrier issued by CPU 1:

    +-------+       :      :                :       :
    |       |       +------+                +-------+  | Sequence of update
    |       |------>| B=2  |-----       --->| Y->8  |  | of perception on
    |       |  :    +------+     \          +-------+  | CPU 2
    | CPU 1 |  :    | A=1  |      \     --->| C->&Y |  V
    |       |       +------+       |        +-------+
    |       |   wwwwwwwwwwwwwwww   |        :       :
    |       |       +------+       |        :       :
    |       |  :    | C=&B |---    |        :       :       +-------+
    |       |  :    +------+   \   |        +-------+       |       |
    |       |------>| D=4  |    ----------->| C->&B |------>|       |
    |       |       +------+       |        +-------+       |       |
    +-------+       :      :       |        :       :       |       |
                                   |        :       :       |       |
                                   |        :       :       | CPU 2 |
                                   |        +-------+       |       |
        Apparently incorrect --->  |        | B->7  |------>|       |
        perception of B (!)        |        +-------+       |       |
                                   |        :       :       |       |
                                   |        +-------+       |       |
        The load of X holds --->    \       | X->9  |------>|       |
        up the maintenance           \      +-------+       |       |
        of coherence of B             ----->| B->2  |       +-------+
                                            +-------+
                                            :       :

I don't understand, since we have a write barrier, so, any store must take effect when C = &B is executed, which means whence B would equals 2. For CPU 2, B should have been 2 when it gets the value of C, which is &B, why would it perceive B as 7. I am really confused.

回答1:

The key missing point is the mistaken assumption that for the sequence:

LOAD C (gets &B)
LOAD *C (reads B)

the first load has to precede the second load. A weakly ordered architectures can act "as if" the following happened:

LOAD B (reads B)  
LOAD C (reads &B)
if( C!=&B ) 
    LOAD *C
else
    Congratulate self on having already loaded *C

The speculative "LOAD B" can happen, for example, because B was on the same cache line as some other variable of earlier interest or hardware prefetching grabbed it.

回答2:

From the section of the document titled "WHAT MAY NOT BE ASSUMED ABOUT MEMORY BARRIERS?":

There is no guarantee that any of the memory accesses specified before a memory barrier will be complete by the completion of a memory barrier instruction; the barrier can be considered to draw a line in that CPU's access queue that accesses of the appropriate type may not cross.

and

There is no guarantee that a CPU will see the correct order of effects from a second CPU's accesses, even if the second CPU uses a memory barrier, unless the first CPU also uses a matching memory barrier (see the subsection on "SMP Barrier Pairing").

What memory barriers do (in a very simplified way, of course) is make sure neither the compiler nor in-CPU hardware perform any clever attempts at reordering load (or store) operations across a barrier, and that the CPU correctly perceives changes to the memory made by other parts of the system. This is necessary when the loads (or stores) carry additional meaning, like locking a lock before accessing whatever it is we're locking. In this case, letting the compiler/CPU make the accesses more efficient by reordering them is hazardous to the correct operation of our program.

When reading this document we need to keep two things in mind:

That a load means transmitting a value from memory (or cache) to a CPU register.
That unless the CPUs share the cache (or have no cache at all), it is possible for their cache systems to be momentarily our of sync.

Fact #2 is one of the reasons why one CPU can perceive the data differently from another. While cache systems are designed to provide good performance and coherence in the general case, but might need some help in specific cases like the ones illustrated in the document.

In general, like the document suggests, barriers in systems involving more than one CPU should be paired to force the system to synchronize the perception of both (or all participating) CPUs. Picture a situation in which one CPU completes loads or stores and the main memory is updated, but the new data had yet to be transmitted to the second CPU's cache, resulting in a lack of coherence across both CPUs.

I hope this helps. I'd suggest reading memory-barriers.txt again with this in mind and particularly the section titled "THE EFFECTS OF THE CPU CACHE".

来源：https://stackoverflow.com/questions/16983321/how-is-a-memory-barrier-in-linux-kernel-is-used

标签

Linux

memory

linux-kernel

memory-barriers

smp