What happens when reading or writing concurrently without a mutex

倖福魔咒の 提交于 2020-06-11 15:56:11

问题


In Go, a sync.Mutex or chan is used to prevent concurrent access of shared objects. However, in some cases I am just interested in the latest value of a variable or field of an object. Or I like to write a value and do not care if another go-routine overwrites it later or has just overwritten it before.

Update: TLDR; Just don't do this. It is not safe. Read the answers, comments, and linked documents!

Here are two variants good and bad of an example program, where both seem to produce "correct" output using the current Go runtime:

package main

import (
    "flag"
    "fmt"
    "math/rand"
    "time"
)

var bogus = flag.Bool("bogus", false, "use bogus code")

func pause() {
    time.Sleep(time.Duration(rand.Uint32()%100) * time.Millisecond)
}

func bad() {
    stop := time.After(100 * time.Millisecond)
    var name string

    // start some producers doing concurrent writes (DANGER!)
    for i := 0; i < 10; i++ {
        go func(i int) {
            pause()
            name = fmt.Sprintf("name = %d", i)
        }(i)
    }

    // start consumer that shows the current value every 10ms
    go func() {
        tick := time.Tick(10 * time.Millisecond)
        for {
            select {
            case <-stop:
                return
            case <-tick:
                fmt.Println("read:", name)
            }
        }
    }()

    <-stop
}

func good() {
    stop := time.After(100 * time.Millisecond)
    names := make(chan string, 10)

    // start some producers concurrently writing to a channel (GOOD!)
    for i := 0; i < 10; i++ {
        go func(i int) {
            pause()
            names <- fmt.Sprintf("name = %d", i)
        }(i)
    }

    // start consumer that shows the current value every 10ms
    go func() {
        tick := time.Tick(10 * time.Millisecond)
        var name string
        for {
            select {
            case name = <-names:
            case <-stop:
                return
            case <-tick:
                fmt.Println("read:", name)
            }
        }
    }()

    <-stop
}

func main() {
    flag.Parse()
    if *bogus {
        bad()
    } else {
        good()
    }
}

The expected output is as follows:

...
read: name = 3
read: name = 3
read: name = 5
read: name = 4
...

Any combination of read: and read: name=[0-9] is correct output for this program. Receiving any other string as output would be an error.

When running this program with go run --race bogus.go it is safe.

However, go run --race bogus.go -bogus warns of the concurrent reads and writes.

For map types and when appending to slices I always need a mutex or a similar method of protection to avoid segfaults or unexpected behavior. However, reading and writing literals (atomic values) to variables or field values seems to be safe.

Question: Which Go data types can I safely read and safely write concurrently without a mutext and without producing segfaults and without reading garbage from memory?

Please explain why something is safe or unsafe in Go in your answer.

Update: I rewrote the example to better reflect the original code, where I had the the concurrent writes issue. The important leanings are already in the comments. I will accept an answer that summarizes these learnings with enough detail (esp. on the Go-runtime).


回答1:


However, in some cases I am just interested in the latest value of a variable or field of an object.

Here is the fundamental problem: What does the word "latest" mean?

Suppoose that, mathematically speaking, we have a sequence of values Xi, with 0 <= i < N. Then obviously Xj is "later than" Xi if j > i. That's a nice simple definition of "latest" and is probably the one you want.

But when two separate CPUs within a single machine—including two goroutines in a Go program—are working at the same time, time itself loses meaning. We cannot say whether i < j, i == j, or i > j. So there is no correct definition for the word latest.

To solve this kind of problem, modern CPU hardware, and Go as a programming language, gives us certain synchronization primitives. If CPUs A and B execute memory fence instructions, or synchronization instructions, or use whatever other hardware provisions exist, the CPUs (and/or some external hardware) will insert whatever is required for the notion of "time" to regain its meaning. That is, if the CPU uses barrier instructions, we can say that a memory load or store that was executed before the barrier is a "before" and a memory load or store that is executed after the barrier is an "after".

(The actual implementation, in some modern hardware, consists of load and store buffers that can rearrange the order in which loads and stores go to memory. The barrier instruction either synchronizes the buffers, or places an actual barrier in them, so that loads and stores cannot move across the barrier. This particular concrete implementation gives an easy way to think about the problem, but isn't complete: you should think of time as simply not existing outside the hardware-provided synchronization, i.e., all loads from, and stores to, some location are happening simultaneously, rather than in some sequential order, except for these barriers.)

In any case, Go's sync package gives you a simple high level access method to these kinds of barriers. Compiled code that executes before a mutex Lock call really does complete before the lock function returns, and the code that executes after the call really does not start until after the lock function returns.

Go's channels provide the same kinds of before/after time guarantees.

Go's sync/atomic package provides much lower level guarantees. In general you should avoid this in favor of the higher level channel or sync.Mutex style guarantees. (Edit to add note: You could use sync/atomic's Pointer operations here, but not with the string type directly, as Go strings are actually implemented as a header containing two separate values: a pointer, and a length. You could solve this with another layer of indirection, by updating a pointer that points to the string object. But before you even consider doing that, you should benchmark the use of the language's preferred methods and verify that these are a problem, because code that works at the sync/atomic level is hard to write and hard to debug.)




回答2:


Which Go data types can I safely read and safely write concurrently without a mutext and without producing segfaults and without reading garbage from memory?

None.

It really is that simple: You cannot, under no circumstance whatsoever, read and write concurrently to anything in Go.

(Btw: Your "correct" program is not correct, it is racy and even if you get rid of the race condition it would not deterministically produce the output.)




回答3:


Why can't you use channels

package main

import (
    "fmt"
    "sync"
)

func main() {

    var wg sync.WaitGroup // wait group to close channel
    var buffer int = 1    // buffer of the channel

    // channel to get the share data
    cName := make(chan string, buffer)
    for i := 0; i < 10; i++ {
        wg.Add(1) // add to wait group
        go func(i int) {
            cName <- fmt.Sprintf("name = %d", i)
            wg.Done() // decrease wait group.
        }(i)

    }

    go func() {
        wg.Wait() // wait of wait group to be 0
        close(cName) // close the channel
    }()

    // process all the data
    for n := range cName {
        println("read:", n)
    }

}

The above code returns the following output

read: name = 0
read: name = 5
read: name = 1
read: name = 2
read: name = 3
read: name = 4
read: name = 7
read: name = 6
read: name = 8
read: name = 9

https://play.golang.org/p/R4n9ssPMOeS

Article about channels



来源:https://stackoverflow.com/questions/61914041/what-happens-when-reading-or-writing-concurrently-without-a-mutex

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!