Finding max value in CUDA

混江龙づ霸主 提交于 2019-12-10 10:03:31

问题


I am trying to write a code in CUDA for finding the max value for the given set of numbers.

Assume you have 20 numbers, and the kernel is running on 2 blocks of 5 threads. Now assume the 10 threads compare the first 10 values at the same time, and thread 2 finds a max value, so thread 2 is updating the max value variable in global memory. While thread 2 is updating, what will happen to the remaining threads (1,3-10) that will be comparing using the old value?

If I lock the global variable using atomicCAS(), will the threads (1,3-10) compare using the old max value? How can I overcome this problem?


回答1:


This is a purely a reduction problem. Here's a good presentation by NVIDIA for optimizing reduction on GPUs. You can use the same technique to either find the minimum, maximum or sum of all elements.




回答2:


The link for Thrust library is broken.
If anyone finds it useful to use it in this case, you can find the documentation here:
Thrust, extrema reductions




回答3:


Unless you're trying to write a reduction kernel, the simplest way is simply to use the CUBLAS.




回答4:


I looked for the same answer but found most are too formidable to a newbie like me. Here is my example code to find the max. Please let me know if this is used properly.

__global__
void find_max(int max_x, int max_y, float *tot, float *x, float *y)
{
    int i = blockIdx.x*blockDim.x + threadIdx.x;
    int j = blockIdx.y*blockDim.y + threadIdx.y;
    if(i < max_x && j<max_y) {
        if(*tot < x[i])
            atomicExch(tot, x[i]);
    }
}


来源:https://stackoverflow.com/questions/5255962/finding-max-value-in-cuda

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!