问题
I am trying to speed up a process that slows down my main thread by distributing it at least across two different cores.
The reason I think I can pull this off is that each of the individual operations are independent requiring only two points and a float.
However my first stab at is has the code running significantly slower when doing queue.asnc
vs queue.sync
and I have no clue why!
Here is the code running synchronously
var block = UnsafeMutablePointer<Datas>.allocate(capacity: 0)
var outblock = UnsafeMutablePointer<Decimal>.allocate(capacity: 0)
func initialise()
{
outblock = UnsafeMutablePointer<Decimal>.allocate(capacity: testWith * 4 * 2)
block = UnsafeMutablePointer<Datas>.allocate(capacity: particles.count)
}
func update()
{
var i = 0
for part in particles
{
part.update()
let x1 = part.data.p1.x; let y1 = part.data.p1.y
let x2 = part.data.p2.x; let y2 = part.data.p2.x;
let w = part.data.size * rectScale
let w2 = part.data.size * rectScale
let dy = y2 - y1; let dx = x2 - x1
let length = sqrt(dy * dy + dx * dx)
let calcx = (-(y2 - y1) / length)
let calcy = ((x2 - x1) / length)
let calcx1 = calcx * w
let calcy1 = calcy * w
let calcx2 = calcx * w2
let calcy2 = calcy * w2
outblock[i] = x1 + calcx1
outblock[i+1] = y1 + calcy1
outblock[i+2] = x1 - calcx1
outblock[i+3] = y1 - calcy1
outblock[i+4] = x2 + calcx2
outblock[i+5] = y2 + calcy2
outblock[i+6] = x2 - calcx2
outblock[i+7] = y2 - calcy2
i += 8
}
}
Here is my attempt at distributing the workload among multiple cores
let queue = DispatchQueue(label: "construction_worker_1", attributes: .concurrent)
let blocky = block
let oblocky = outblock
for i in 0..<particles.count
{
particles[i].update()
block[i] = particles[i].data//Copy the raw data into a thead safe format
queue.async {
let x1 = blocky[i].p1.x; let y1 = blocky[i].p1.y
let x2 = blocky[i].p2.x; let y2 = blocky[i].p2.x;
let w = blocky[i].size * rectScale
let w2 = blocky[i].size * rectScale
let dy = y2 - y1; let dx = x2 - x1
let length = sqrt(dy * dy + dx * dx)
let calcx = (-(y2 - y1) / length)
let calcy = ((x2 - x1) / length)
let calcx1 = calcx * w
let calcy1 = calcy * w
let calcx2 = calcx * w2
let calcy2 = calcy * w2
let writeIndex = i * 8
oblocky[writeIndex] = x1 + calcx1
oblocky[writeIndex+1] = y1 + calcy1
oblocky[writeIndex+2] = x1 - calcx1
oblocky[writeIndex+3] = y1 - calcy1
oblocky[writeIndex+4] = x2 + calcx2
oblocky[writeIndex+5] = y2 + calcy2
oblocky[writeIndex+6] = x2 - calcx2
oblocky[writeIndex+7] = y2 - calcy2
}
}
I really have no clue why this slowdown is happening! I am using UnsafeMutablePointer
so the data is thread safe and I am ensuring that no variable can ever get read or written by multiple threads at the same time.
What is going on here?
回答1:
As described in Performing Loop Iterations Concurrently, there is overhead with each block dispatched to some background queue. So you will want to “stride” through your array, letting each iteration process multiple data points, not just one.
Also, dispatch_apply
, called concurrentPerform
in Swift 3 and later, is designed for performing loops in parallel and it’s optimized for the particular device’s cores. Combined with striding, you should achieve some performance benefit:
DispatchQueue.global(qos: .userInitiated).async {
let stride = 100
DispatchQueue.concurrentPerform(iterations: particles.count / stride) { iteration in
let start = iteration * stride
let end = min(start + stride, particles.count)
for i in start ..< end {
particles[i].update()
block[i] = particles[i].data//Copy the raw data into a thead safe format
queue.async {
let x1 = blocky[i].p1.x; let y1 = blocky[i].p1.y
let x2 = blocky[i].p2.x; let y2 = blocky[i].p2.x
let w = blocky[i].size * rectScale
let w2 = blocky[i].size * rectScale
let dy = y2 - y1; let dx = x2 - x1
let length = hypot(dy, dx)
let calcx = -dy / length
let calcy = dx / length
let calcx1 = calcx * w
let calcy1 = calcy * w
let calcx2 = calcx * w2
let calcy2 = calcy * w2
let writeIndex = i * 8
oblocky[writeIndex] = x1 + calcx1
oblocky[writeIndex+1] = y1 + calcy1
oblocky[writeIndex+2] = x1 - calcx1
oblocky[writeIndex+3] = y1 - calcy1
oblocky[writeIndex+4] = x2 + calcx2
oblocky[writeIndex+5] = y2 + calcy2
oblocky[writeIndex+6] = x2 - calcx2
oblocky[writeIndex+7] = y2 - calcy2
}
}
}
}
You should experiment with different stride
values and see how the performance changes.
I can't run this code (I don't have sample data, I don't have definition of Datas
, etc.), so I apologize if I introduced any issues. But don't focus on the code here, and instead just focus on the broader issues of using concurrentPerform
for performing concurrent loops, and striding to ensure that you've got enough work on each thread so threading overhead doesn't outweigh the broader benefits of running threads in parallel.
For more information, see https://stackoverflow.com/a/22850936/1271826 for a broader discussion of the issues here.
回答2:
Your expectations may be wrong. Your goal was to free up the main thread, and you did that. That is what is now faster: the main thread!
But async
on a background thread means "please do this any old time you please, allowing it to pause so other code can run in the middle of it" — it doesn't mean "do it fast", not at all. And I don't see any qos
specification in your code, so it's not like you are asking for special attention or anything.
来源:https://stackoverflow.com/questions/46498928/code-runs-faster-when-queued-synchronously-than-asynchronously-shouldnt-it-be