Find max/min in CUDA without passing it to the CPU

醉酒当歌 提交于 2019-12-04 05:37:39

问题


I need to find the index of the maximum element in an array of floats. I am using the function "cublasIsamax", but this returns the index to the CPU, and this is slowing down the running time of the application.

Is there a way to compute this index efficiently and store it in the GPU?

Thanks!


回答1:


Since the CUBLAS V2 API was introduced (with CUDA 4.0, IIRC), it is possible to have routines which return a scalar or index to store those directly into a variable in device memory, rather than into a host variable (which entails a device to host transfer and might leave the result in the wrong memory space).

To use this, you need to use the cublasSetPointerMode call to tell the CUBLAS context to expect pointers for scalar arguments to be device pointers by using the CUBLAS_POINTER_MODE_DEVICE mode. This then implies that in a call like

cublasStatus_t cublasIsamax(cublasHandle_t handle, int n,
                            const float *x, int incx, int *result)

that result must be a device pointer.




回答2:


If you want to use CUBLAS and you have a GPU with compute capability 3.5 (K20, Titan) than you can use CUBLAS with dynamic parallelism. Than you can call CUBLAS from within a kernel on the GPU and no data will be returned to the CPU. If you have no device with cc 3.5 you will probably have to implement a find max function by yourself or look for an aditional library.



来源:https://stackoverflow.com/questions/18510485/find-max-min-in-cuda-without-passing-it-to-the-cpu

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!