Why is an acquire barrier needed before deleting the data in an atomically reference counted smart pointer?

后端 未结 3 1562
不知归路
不知归路 2021-02-08 05:08

Boost provides a sample atomically reference counted shared pointer

Here is the relevant code snippet and the explanation for the various orderings used:



        
3条回答
  •  南旧
    南旧 (楼主)
    2021-02-08 05:30

    I think I found a rather simple example that shows why the acquire fence is needed.

    Let's assume our X looks like this:

    struct X
    {
        ~X() { free(data); }
        void* data;
        atomic refcount;
    };
    

    Let's further assume that we have two functions foo and bar that look like this (I'll inline the reference count decrements):

    void foo(X* x)
    {
        void* newData = generateNewData();
        free(x->data);
        x->data = newData;
        if (x->refcount.fetch_sub(1, memory_order_release) == 1)
            delete x;
    }
    
    void bar(X* x)
    {
        // Do something unrelated to x
        if (x->refcount.fetch_sub(1, memory_order_release) == 1)
            delete x;
    }
    

    The delete instruction will execute x's destructor and then free the memory occupied by x. Let's inline that:

    void bar(X* x)
    {
        // Do something unrelated to x
        if (x->refcount.fetch_sub(1, memory_order_release) == 1)
        {
            free(x->data);
            operator delete(x);
        }
    }
    

    Because there is no acquire fence, the compiler could decide to load the address x->data to a register before executing the atomic decrement (as long as there is no data race, the observable effect would be the same):

    void bar(X* x)
    {
        register void* r1 = x->data;
        // Do something unrelated to x
        if (x->refcount.fetch_sub(1, memory_order_release) == 1)
        {
            free(r1);
            operator delete(x);
        }
    }
    

    Now let's assume that refcount of x is 2 and that we have two threads. Thread 1 calls foo, thread 2 calls bar:

    1. Thread 2 loads x->data to a register.
    2. Thread 1 generates new data.
    3. Thread 1 frees the "old" data.
    4. Thread 1 assigns the new data to x->data.
    5. Thread 1 decrements refcount from 2 to 1.
    6. Thread 2 decrements refcount from 1 to 0.
    7. Thread 2 frees the "old" data again instead of the new data.

    Key insight for me was that "prior writes [...] become visible in this thread" can mean something trivial as "do not use values you cached to registers before the fence".

提交回复
热议问题