How to manage same CUDA kernel call from multiple CPU threads?

I have a cuda kernel which works fine when called from a single CPU threads. However when the same is called from multiple CPU threads (~100), most of the kernel seems not be executed at all as the results comes out to be all zeros.Can someone please guide me how to resolve this problem?

In the current version of kernel I am using a cudadevicesynchronize() at the end of kernel call. Will adding a sync command before cudaMalloc() and kernel call be of any help in this case?

There is another thing which need some clarification. i.e. If two CPU threads executes the same cudaMalloc() command, will the later overwrite the former in GPU memory or will they create their own memory?

Thanks in advance for your help

Usually one CPU thread can be used for calling a CUDA kernel. However, since CUDA 4.0, multiple CPU threads can share context. You can use cuCtxSetCurrent to tie the context of the kernel to the current thread. More information about this API function can be found here.

Another workaround for this is to create a GPU worker thread that holds the context and pass any CUDA request to that thread.

Regarding your other question, without setting the context for the proper thread, I remember that cudaMalloc would not even execute (I work with JCuda so the behavior may be a little different). But if the context is currently set to the calling kernel, the memories will not be overwritten.

来源：https://stackoverflow.com/questions/21779985/how-to-manage-same-cuda-kernel-call-from-multiple-cpu-threads

标签

multithreading

cuda

thread-safety

gpu

gpgpu

易学教程内所有资源均来自网络或用户发布的内容，如有违反法律规定的内容欢迎反馈！
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!