Does a memory barrier ensure that the cache coherence has been completed?

后端 未结 4 2061
眼角桃花
眼角桃花 2020-11-28 04:01

Say I have two threads that manipulate the global variable x. Each thread (or each core I suppose) will have a cached copy of x.

Now say th

4条回答
  •  情话喂你
    2020-11-28 04:52

    No, a memory barrier does not ensure that cache coherence has been "completed". It often involves no coherence operation at all and can be performed speculatively or as a no-op.

    It only enforces the ordering semantics described in the barrier. For example, an implementation might just put a marker in the store queue such that store-to-load forwarding doesn't occur for stores older than the marker.

    Intel, in particular, already has a strong memory model for normal loads and stores (the kind that compilers generate and that you'd use in assembly) where the only possible re-ordering is later loads passing earlier stores. In the terminology of SPARC memory barriers, every barrier other than StoreLoad is already a no-op.

    In practice, the interesting barriers on x86 are attached to LOCKed instructions, and the execution of such an instruction doesn't necessarily involve any cache coherence at all. If the line is already in an exclusive state, the CPU may simply execute the instruction, making sure not to release the exclusive state of the line while the operation is in progress (i.e., between the read of the argument and writeback of the result) and then only deal with preventing store-to-load forwarding from breaking the total ordering that LOCK instructions come with. Currently they do that by draining the store queue, but in future processors even that could be speculative.

    What a memory barrier or barrier+op does is ensure that the operation is seen by other agents in a relative order that obeys all the restriction of the barrier. That certainly doesn't usually involve pushing the result to other CPUs as a coherence operation as you question implies.

提交回复
热议问题