Reduction of matrix rows in OpenCL
问题 I have an matrix which is stored as 1D array in the GPU, I'm trying to make an OpenCL kernel which will use reduction in every row of this matrix, for example: Let's consider my matrix is 2x3 with the elements [1, 2, 3, 4, 5, 6], what I want to do is: [1, 2, 3] = [ 6] [4, 5, 6] [15] Obviously as I'm talking about reduction, the actual return could be of more than one element per row: [1, 2, 3] = [3, 3] [4, 5, 6] [9, 6] Then the final calculation I can do in another kernel or in the CPU. Well,