问题
I'm sure someone has come across this problem before, basically I have a 2D optimisation grid NxM in size, with the constraint that n_i <= m_i , i.e I only want to calculate the pairs in the lower triangular section of the matrix. At the moment I naively just implement all NxM combinations in a N local groups of M work groups (and then use localGroupID and workGroupID to identify the pair), and then return -inf if the constraint fails to save computation.
But is there a better way to set up the threads and index them so I only need to generated (NXM)/2 threads rather than the full NxM.
Many thanks Sam
回答1:
Of course, it's just geometry. Any right triangle can be divided up into a rectangle with the same area. Just slice it in half horizontally and vertically and re-assemble the pieces back into a rectangle. In terms of implementation, make your global work size equal to the width of the triangle and the height equal to half the triangle height. In the kernel, if the x coordinate is more than half the width, check if (x - half) > y and if so then x = width - x - 1 and y = y + half_height. You'll have some thread divergence along the boundary, but you won't leave half your work items idle.
来源:https://stackoverflow.com/questions/24021305/opencl-efficient-way-to-group-a-lower-triangular-matrix