CUDA: Getting max value and its index in an array

前端 未结 5 910
旧巷少年郎
旧巷少年郎 2021-01-03 01:57

I have several blocks were each block executes on separate part of an integer array. As an example: block one from array[0] to array[9] and block two from array[10] to array

5条回答
  •  陌清茗
    陌清茗 (楼主)
    2021-01-03 02:58

    The size of your array in comparison to shared memory is almost irrelevant, since the number of threads in each block is the limiting factor rather than the size of the array. One solution is to have each thread block work on a size of the array the same size as the thread block. That is, if you have 512 threads, then block n will be looking at array[ n ] thru array[ n + 511 ]. Each block does a reduction to find the highest member in that portion of the array. Then you bring the max of each section back to the host and do a simple linear search to locate the highest value in the overall array. Each reduction no the GPU reduces the linear search by a factor of 512. Depending on the size of the array, you might want to do more reductions before you bring the data back. (If your array is 3*512^10 in size, you might want to do 10 reductions on the gpu, and have the host search through the 3 remaining data points.)

提交回复
热议问题