Why is an acquire barrier needed before deleting the data in an atomically reference counted smart pointer?

后端未结
关注
 3  1562
不知归路 2021-02-08 05:08
Boost provides a sample atomically reference counted shared pointer
Here is the relevant code snippet and the explanation for the various orderings used:

      
      
        
          3条回答        

        
                    
            
            
                         
                
              
              
                
                   南旧
                                             
                
                
                (楼主)
            
              
              
                2021-02-08 05:30
              

            
            
                        
I think I found a rather simple example that shows why the acquire fence is needed.
Let's assume our X looks like this:
struct X
{
    ~X() { free(data); }
    void* data;
    atomic refcount;
};

Let's further assume that we have two functions foo and bar that look like this (I'll inline the reference count decrements):
void foo(X* x)
{
    void* newData = generateNewData();
    free(x->data);
    x->data = newData;
    if (x->refcount.fetch_sub(1, memory_order_release) == 1)
        delete x;
}

void bar(X* x)
{
    // Do something unrelated to x
    if (x->refcount.fetch_sub(1, memory_order_release) == 1)
        delete x;
}

The delete instruction will execute x's destructor and then free the memory occupied by x. Let's inline that:
void bar(X* x)
{
    // Do something unrelated to x
    if (x->refcount.fetch_sub(1, memory_order_release) == 1)
    {
        free(x->data);
        operator delete(x);
    }
}

Because there is no acquire fence, the compiler could decide to load the address x->data to a register before executing the atomic decrement (as long as there is no data race, the observable effect would be the same):
void bar(X* x)
{
    register void* r1 = x->data;
    // Do something unrelated to x
    if (x->refcount.fetch_sub(1, memory_order_release) == 1)
    {
        free(r1);
        operator delete(x);
    }
}

Now let's assume that refcount of x is 2 and that we have two threads. Thread 1 calls foo, thread 2 calls bar:

Thread 2 loads x->data to a register.
Thread 1 generates new data.
Thread 1 frees the "old" data.
Thread 1 assigns the new data to x->data.
Thread 1 decrements refcount from 2 to 1.
Thread 2 decrements refcount from 1 to 0.
Thread 2 frees the "old" data again instead of the new data.

Key insight for me was that "prior writes [...] become visible in this thread" can mean something trivial as "do not use values you cached to registers before the fence".
    
             
                                                        
            
            
              
                
                0
              
                   
                
               讨论(0)
              
                                                  
              
              
                          
             
       
          
              
                                       
     查看其它3个回答


            
                         
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
                              			
        
        
        
          
            
            
              
              
            
    


                                 
              
            
                          
    

        
         
                验证码
                
                  
                
                
                   看不清?
                
              
                                  
                    
   
                 
             
              提交回复