Code runs faster when queued synchronously than asynchronously. Shouldn't it be the opposite?

问题

I am trying to speed up a process that slows down my main thread by distributing it at least across two different cores.

The reason I think I can pull this off is that each of the individual operations are independent requiring only two points and a float.

However my first stab at is has the code running significantly slower when doing queue.asnc vs queue.sync and I have no clue why!

Here is the code running synchronously

var block = UnsafeMutablePointer<Datas>.allocate(capacity: 0)
var outblock = UnsafeMutablePointer<Decimal>.allocate(capacity: 0)
func initialise()
{
    outblock = UnsafeMutablePointer<Decimal>.allocate(capacity: testWith * 4 * 2)

    block = UnsafeMutablePointer<Datas>.allocate(capacity: particles.count)
}

func update()
{
    var i = 0
    for part in particles
    {
        part.update()

        let x1 = part.data.p1.x; let y1 = part.data.p1.y
        let x2 = part.data.p2.x; let y2 = part.data.p2.x;

        let w = part.data.size * rectScale
        let w2 = part.data.size * rectScale

        let dy = y2 - y1; let dx = x2 - x1
        let length = sqrt(dy * dy + dx * dx)
        let calcx = (-(y2 - y1) / length)
        let calcy = ((x2 - x1) / length)
        let calcx1 = calcx * w
        let calcy1 = calcy * w
        let calcx2 = calcx * w2
        let calcy2 = calcy * w2
        outblock[i] = x1 + calcx1
        outblock[i+1] = y1 + calcy1

        outblock[i+2] = x1 - calcx1
        outblock[i+3] = y1 - calcy1

        outblock[i+4] = x2 + calcx2
        outblock[i+5] = y2 + calcy2

        outblock[i+6] = x2 - calcx2
        outblock[i+7] = y2 - calcy2

        i += 8
    }
}

Here is my attempt at distributing the workload among multiple cores

let queue = DispatchQueue(label: "construction_worker_1", attributes: .concurrent)

let blocky = block
let oblocky = outblock
for i in 0..<particles.count
{
    particles[i].update()
    block[i] = particles[i].data//Copy the raw data into a thead safe format
    queue.async {
        let x1 = blocky[i].p1.x; let y1 = blocky[i].p1.y
        let x2 = blocky[i].p2.x; let y2 = blocky[i].p2.x;

        let w = blocky[i].size * rectScale
        let w2 = blocky[i].size * rectScale

        let dy = y2 - y1; let dx = x2 - x1
        let length = sqrt(dy * dy + dx * dx)
        let calcx = (-(y2 - y1) / length)
        let calcy = ((x2 - x1) / length)
        let calcx1 = calcx * w
        let calcy1 = calcy * w
        let calcx2 = calcx * w2
        let calcy2 = calcy * w2

        let writeIndex = i * 8
        oblocky[writeIndex] = x1 + calcx1
        oblocky[writeIndex+1] = y1 + calcy1

        oblocky[writeIndex+2] = x1 - calcx1
        oblocky[writeIndex+3] = y1 - calcy1

        oblocky[writeIndex+4] = x2 + calcx2
        oblocky[writeIndex+5] = y2 + calcy2

        oblocky[writeIndex+6] = x2 - calcx2
        oblocky[writeIndex+7] = y2 - calcy2
    }
}

I really have no clue why this slowdown is happening! I am using UnsafeMutablePointer so the data is thread safe and I am ensuring that no variable can ever get read or written by multiple threads at the same time.

What is going on here?

回答1:

As described in Performing Loop Iterations Concurrently, there is overhead with each block dispatched to some background queue. So you will want to “stride” through your array, letting each iteration process multiple data points, not just one.

Also, dispatch_apply, called concurrentPerform in Swift 3 and later, is designed for performing loops in parallel and it’s optimized for the particular device’s cores. Combined with striding, you should achieve some performance benefit:

DispatchQueue.global(qos: .userInitiated).async {
    let stride = 100
    DispatchQueue.concurrentPerform(iterations: particles.count / stride) { iteration in
        let start = iteration * stride
        let end = min(start + stride, particles.count)
        for i in start ..< end {
            particles[i].update()
            block[i] = particles[i].data//Copy the raw data into a thead safe format
            queue.async {
                let x1 = blocky[i].p1.x; let y1 = blocky[i].p1.y
                let x2 = blocky[i].p2.x; let y2 = blocky[i].p2.x

                let w = blocky[i].size * rectScale
                let w2 = blocky[i].size * rectScale

                let dy = y2 - y1; let dx = x2 - x1
                let length = hypot(dy, dx)
                let calcx = -dy / length
                let calcy = dx / length
                let calcx1 = calcx * w
                let calcy1 = calcy * w
                let calcx2 = calcx * w2
                let calcy2 = calcy * w2

                let writeIndex = i * 8
                oblocky[writeIndex] = x1 + calcx1
                oblocky[writeIndex+1] = y1 + calcy1

                oblocky[writeIndex+2] = x1 - calcx1
                oblocky[writeIndex+3] = y1 - calcy1

                oblocky[writeIndex+4] = x2 + calcx2
                oblocky[writeIndex+5] = y2 + calcy2

                oblocky[writeIndex+6] = x2 - calcx2
                oblocky[writeIndex+7] = y2 - calcy2
            }
        }
    }
}

You should experiment with different stride values and see how the performance changes.

I can't run this code (I don't have sample data, I don't have definition of Datas, etc.), so I apologize if I introduced any issues. But don't focus on the code here, and instead just focus on the broader issues of using concurrentPerform for performing concurrent loops, and striding to ensure that you've got enough work on each thread so threading overhead doesn't outweigh the broader benefits of running threads in parallel.

For more information, see https://stackoverflow.com/a/22850936/1271826 for a broader discussion of the issues here.

回答2:

Your expectations may be wrong. Your goal was to free up the main thread, and you did that. That is what is now faster: the main thread!

But async on a background thread means "please do this any old time you please, allowing it to pause so other code can run in the middle of it" — it doesn't mean "do it fast", not at all. And I don't see any qos specification in your code, so it's not like you are asking for special attention or anything.

来源：https://stackoverflow.com/questions/46498928/code-runs-faster-when-queued-synchronously-than-asynchronously-shouldnt-it-be

标签

ios

swift

multithreading

grand-central-dispatch