OpenCL efficient way to group a lower triangular matrix

冷暖自知 提交于 2019-12-11 11:43:51

问题


I'm sure someone has come across this problem before, basically I have a 2D optimisation grid NxM in size, with the constraint that n_i <= m_i , i.e I only want to calculate the pairs in the lower triangular section of the matrix. At the moment I naively just implement all NxM combinations in a N local groups of M work groups (and then use localGroupID and workGroupID to identify the pair), and then return -inf if the constraint fails to save computation.

But is there a better way to set up the threads and index them so I only need to generated (NXM)/2 threads rather than the full NxM.

Many thanks Sam


回答1:


Of course, it's just geometry. Any right triangle can be divided up into a rectangle with the same area. Just slice it in half horizontally and vertically and re-assemble the pieces back into a rectangle. In terms of implementation, make your global work size equal to the width of the triangle and the height equal to half the triangle height. In the kernel, if the x coordinate is more than half the width, check if (x - half) > y and if so then x = width - x - 1 and y = y + half_height. You'll have some thread divergence along the boundary, but you won't leave half your work items idle.



来源:https://stackoverflow.com/questions/24021305/opencl-efficient-way-to-group-a-lower-triangular-matrix

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!