CUDA: How to pass multiple duplicated arguments to CUDA Kernel

断了今生、忘了曾经 提交于 2019-12-13 08:14:55

问题


I'm looking for an elegent way to pass multiple duplicated arguments in CUDA kernel,

As we all know, each kernel argument is located on the stack of each CUDA thread, therefore, there might be duplication between arguments being passed by the Kernel to each thread, memory which is located on each stack.

In order to minimize the number of duplicated arguments being passed, I'm looking for an elegant way doing so.

In order to explain my concern: Let's say my code looks like this:

   kernelFunction<<<gridSize,blockSize>>>(UINT imageWidth, UINT imageWidth, UINT imageStride, UINT numberOfElements,x,y,ect...)

The UINT imageWidth, UINT imageWidth, UINT imageStride, UINT numberOfElements arguments are located at each thread stock ,

I'm looking for a trick to send less arguments and access the data from other source.

I was thinking about using constant memory, but since constant memory is located on the global , I drop it. needless to say that the memory location should be fast.

Any help would be appreciated.


回答1:


Kernel arguments are passed in via constant memory (or shared memory in sm_1x), so there is no replication as you suggest.

c.f. the programming guide:

__global__ function parameters are passed to the device:

  • via shared memory and are limited to 256 bytes on devices of compute capability 1.x,
  • via constant memory and are limited to 4 KB on devices of compute capability 2.x and higher.

Of course, if you subsequently modify one of variable in your code then you're modifying a local copy (as per the C standard) and hence each thread will have its own copy, either in registers or, if needed, on the stack.



来源:https://stackoverflow.com/questions/14857371/cuda-how-to-pass-multiple-duplicated-arguments-to-cuda-kernel

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!