Why is atomic.StoreUint32 preferred over a normal assignment in sync.Once?

问题

While reading the source codes of Go, I have a question about the code in src/sync/once.go:

func (o *Once) Do(f func()) {
    // Note: Here is an incorrect implementation of Do:
    //
    //  if atomic.CompareAndSwapUint32(&o.done, 0, 1) {
    //      f()
    //  }
    //
    // Do guarantees that when it returns, f has finished.
    // This implementation would not implement that guarantee:
    // given two simultaneous calls, the winner of the cas would
    // call f, and the second would return immediately, without
    // waiting for the first's call to f to complete.
    // This is why the slow path falls back to a mutex, and why
    // the atomic.StoreUint32 must be delayed until after f returns.

    if atomic.LoadUint32(&o.done) == 0 {
        // Outlined slow-path to allow inlining of the fast-path.
        o.doSlow(f)
    }
}

func (o *Once) doSlow(f func()) {
    o.m.Lock()
    defer o.m.Unlock()
    if o.done == 0 {
        defer atomic.StoreUint32(&o.done, 1)
        f()
    }
}

why is ataomic.StoreUint32 used, rather than, say o.done = 1? Are these not equivalent? What are the differences?

Must we use the atomic operation (atomic.StoreUint32) to make sure that other goroutines can observe the effect of "f()" before o.done is set to 1 is observed on a machine with weak memory model?

回答1:

Remember, unless you are writing the assembly by hand, you are not programming to your machine's memory model, you are programming to Go's memory model. This means that even if primitive assignments are atomic with your architecture, Go requires the use of the atomic package to ensure correctness across alls supported architectures.

Access to the done flag outside of the mutex only needs to be safe, not strictly ordered, so atomic operations can be used instead of always obtaining a lock with a mutex. This is an optimization to make the fast path as efficient as possible, allowing sync.Once to be used in hot paths.

The mutex used for doSlow is for mutual exclusion within that function alone, to ensure that only one caller ever makes it to f() before the done flag is set. The flag is written using atomic.StoreUint32, because it may happen concurrently with atomic.LoadUint32 outside of the critical section protected by the mutex.

Reading the done field concurrently with writes, even atomic writes, is a data race. Just because the field is read atomically, does not mean you can use normal assignment to write it, hence the flag is checked first with atomic.LoadUint32 and written with atomic.StoreUint32

The direct read of done within doSlow is safe, because it is protected from concurrent writes by the mutex. Reading the value concurrently with atomic.LoadUint32 is safe because both are read operations.

回答2:

can we use "defer func() {o.done=1}()" to replace "defer atomic.StoreUint32(&o.done, 1)" ?

No.

来源：https://stackoverflow.com/questions/55964014/why-sync-once-using-atomic-in-doslow

标签