Why does adding concurrency slow down this golang code?

后端 未结 4 1284
天涯浪人
天涯浪人 2020-12-12 19:31

I\'ve got a bit of Go code that I\'ve been tinkering with to answer a little curiosity of mine related to a video game my brother-in-law plays.

Essentially, the cod

相关标签:
4条回答
  • 2020-12-12 19:43

    My results, which show substantial concurrency for 4 CPUs versus 1 CPU:

    Intel Core 2 Quad CPU Q8300 @ 2.50GHz x 4

    Source code: UPDATE (01/12/13 18:05)

    $ go version
    go version devel +adf4e96e9aa4 Thu Jan 10 09:57:01 2013 +1100 linux/amd64
    
    $ time  go run temp.go
    Number of CPUs:  1
    real    0m30.305s
    user    0m30.210s
    sys     0m0.044s
    
    $ time  go run temp.go
    Number of CPUs:  4
    real    0m9.980s
    user    0m35.146s
    sys     0m0.204s
    
    0 讨论(0)
  • 2020-12-12 19:44

    Your code is sampling a binomial random variable, B(N, p) where N is the number of trials (here 1M), and p is the probability of a successful individual trial (here 0.0003).

    One way to do this is to build a table T of cumulative probabilities, where T[i] contains the probability that the total number of trials is less than or equal to i. To then produce a sample, you can pick a uniform random variable (via rand.Float64) and find the first index in the table that contains a probability greater than or equal to it.

    It's a little more complicated here because you've got a really big N and a fairly small p, so if you try to build the table you get trouble with really small numbers and arithmetic accuracy. But you can build a smaller table (say 1000 large) and sample it 1000 times to get your 1 million trials.

    Here's some code that does all this. It's not too elegant (1000 is hard-coded in), but it generates 1000 simulations in less than a second on my old laptop. It's easy to optimise further, by for example lifting the construction of the BinomialSampler out of the loop, or by using binary search rather than a linear scan to find the table index.

    package main
    
    import (
        "fmt"
        "math"
        "math/rand"
    )
    
    type BinomialSampler []float64
    
    func (bs BinomialSampler) Sample() int {
        r := rand.Float64()
        for i := 0; i < len(bs); i++ {
            if bs[i] >= r {
                return i
            }
        }
        return len(bs)
    }
    
    func NewBinomialSampler(N int, p float64) BinomialSampler {
        r := BinomialSampler(make([]float64, N+1))
        T := 0.0
        choice := 1.0
        for i := 0; i <= N; i++ {
            T += choice * math.Pow(p, float64(i)) * math.Pow(1-p, float64(N-i))
            r[i] = T
            choice *= float64(N-i) / float64(i+1)
        }
        return r
    }
    
    func WowSample(N int, p float64) int {
        if N%1000 != 0 {
            panic("N must be a multiple of 1000")
        }
        bs := NewBinomialSampler(1000, p)
        r := 0
        for i := 0; i < N; i += 1000 {
            r += bs.Sample()
        }
        return r
    }
    
    func main() {
        for i := 0; i < 1000; i++ {
            fmt.Println(WowSample(1000000, 0.0003))
        }
    }
    
    0 讨论(0)
  • 2020-12-12 19:45

    The issue seems to come from your use of rand.Float64(), which uses a shared global object with a Mutex lock on it.

    Instead, if for each CPU you create a separate rand.New(), pass it through to the interactions(), and use it to create the Float64(), there's a massive improvement.


    Update to show the changes to the new example code in the question that now uses rand.New()

    The test() function was modified to either use a given channel, or return the result.

    func test(n int, c chan []int) []int {
        source := rand.NewSource(time.Now().UnixNano())
        generator := rand.New(source)
        simulations := make([]int, n)
        for i := range simulations {
            for _, v := range simulation(NUMBER_OF_INTERACTIONS, generator) {
                simulations[i] += v
            }   
        }   
        if c == nil {
            return simulations
        }   
        c <- simulations
        return nil 
    }
    

    The main() function was updated to run both tests, and output the timed result.

    func main() {
        rand.Seed(time.Now().UnixNano())
    
        nCPU := runtime.NumCPU()
        runtime.GOMAXPROCS(nCPU)
        fmt.Println("Number of CPUs: ", nCPU)
    
        start := time.Now()
        fmt.Println("Successful interactions: ", len(test(NUMBER_OF_SIMULATIONS, nil)))
        fmt.Println(time.Since(start))
    
        start = time.Now()
        tests := make([]chan []int, nCPU)
        for i := range tests {
            c := make(chan []int)
            go test(NUMBER_OF_SIMULATIONS/nCPU, c)
            tests[i] = c
        }
    
        // Concatentate the test results
        results := make([]int, NUMBER_OF_SIMULATIONS)
        for i, c := range tests {
            start := (NUMBER_OF_SIMULATIONS/nCPU) * i
            stop := (NUMBER_OF_SIMULATIONS/nCPU) * (i+1)
            copy(results[start:stop], <-c)
        }
        fmt.Println("Successful interactions: ", len(results))
        fmt.Println(time.Since(start))
    }
    

    The output is I received:

    > Number of CPUs:  2 
    >
    > Successful interactions:  1000 
    > 1m20.39959s
    >
    > Successful interactions:  1000
    > 41.392299s
    
    0 讨论(0)
  • 2020-12-12 20:00

    Testing your code on my Linux quad core i7 laptop I get this

    Here is a Google Spreadsheet

    Screenshot of Google Spreadsheet

    This shows that under Linux at least the scaling is very nearly linear per core.

    I think there may be two reasons why you aren't seeing this.

    The first is that your macbook air only has 2 real cores. It has 4 hyperthreads though which is why it reports 4 as max cpus. A hyperthread typically only gives an extra 15% performance over a single core rather than the 100% you might expect. So stick to benchmarking 1 or 2 CPUs only on the macbook air!

    The other reason might be OS X thread performance compared to Linux. They use different threading models which may be affecting performance.

    0 讨论(0)
提交回复
热议问题