Is the following singleton implementation data-race free?
static std::atomic m_instance;
...
static Tp &
instance()
{
if (!m_instance.lo
That implementation is not race-free. The atomic store of the singleton, while it uses release semantics, will only synchronize with the matching acquire operation—that is, the load operation that is already guarded by the mutex.
It's possible that the outer relaxed load would read a non-null pointer before the locking thread finished initializing the singleton.
The acquire that is guarded by the lock, on the other hand, is redundant. It will synchronize with any store with release semantics on another thread, but at that point (thanks to the mutex) the only thread that can possibly store is the current thread. That load doesn't even need to be atomic—no stores can happen from another thread.
See Anthony Williams' series on C++0x multithreading.