Why is an acquire barrier needed before deleting the data in an atomically reference counted smart pointer?

痴心易碎 提交于 2019-12-03 06:27:25

Consider two threads, each holding one reference to the object, which are the last two references:

------------------------------------------------------------
        Thread 1                              Thread 2
------------------------------------------------------------
   // play with x here

   fetch_sub(...)                            
                                            fetch_sub(...)
   // nothing
                                            delete x;

You have to ensure that any changes made to the object by Thread 1 in //play with x here is visible to Thread 2 when it calls delete x;. For this you need an acquire fence, which, together with the memory_order_release on the fetch_sub() calls, guarantees that the changes made by Thread 1 will be visible.

From, http://en.cppreference.com/w/cpp/atomic/memory_order

memory_order_acquire -- A load operation with this memory order performs the acquire operation on the affected memory location: prior writes made to other memory locations by the thread that did the release become visible in this thread.

...

Release-Acquire ordering

If an atomic store in thread A is tagged std::memory_order_release and an atomic load in thread B from the same variable is tagged std::memory_order_acquire, all memory writes (non-atomic and relaxed atomic) that happened-before the atomic store from the point of view of thread A, become visible side-effects in thread B, that is, once the atomic load is completed, thread B is guaranteed to see everything thread A wrote to memory.

The synchronization is established only between the threads releasing and acquiring the same atomic variable. Other threads can see different order of memory accesses than either or both of the synchronized threads.

On strongly-ordered systems (x86, SPARC TSO, IBM mainframe), release-acquire ordering is automatic for the majority of operations. No additional CPU instructions are issued for this synchronization mode, only certain compiler optimizations are affected (e.g. the compiler is prohibited from moving non-atomic stores past the atomic store-release or perform non-atomic loads earlier than the atomic load-acquire). On weakly-ordered systems (ARM, Itanium, PowerPC), special CPU load or memory fence instructions have to be used.

This means that release allows other threads to synchronize pending operations from current thread, while the later acquire fetches all modified changes from the other threads.

On strongly-ordered systems, this is not as important. I don't think these instructions even generate code as the CPU automatically locks cache lines before any writes can occur. The cache is guaranteed to be consistent. But on weekly ordered systems, while atomic operations are well defined, there could be pending operations to other parts of memory.

So, let's say threads A and B and both share some data D.

  1. A gets some lock and it does things to D
  2. A releases lock
  3. B releases lock, finds 0 ref count and so decides to delete D
  4. deletes D
  5. ... data pending in #1 is not visible yet, so bad things happen.

with the thread fence acquire before delete, the current thread synchronizes all pending operations from other threads in its address space. And when delete happens, it sees what A did in #1.

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!