CUDA: How to pass multiple duplicated arguments to CUDA Kernel

问题

I'm looking for an elegent way to pass multiple duplicated arguments in CUDA kernel,

As we all know, each kernel argument is located on the stack of each CUDA thread, therefore, there might be duplication between arguments being passed by the Kernel to each thread, memory which is located on each stack.

In order to minimize the number of duplicated arguments being passed, I'm looking for an elegant way doing so.

In order to explain my concern: Let's say my code looks like this:

   kernelFunction<<<gridSize,blockSize>>>(UINT imageWidth, UINT imageWidth, UINT imageStride, UINT numberOfElements,x,y,ect...)

The UINT imageWidth, UINT imageWidth, UINT imageStride, UINT numberOfElements arguments are located at each thread stock ,

I'm looking for a trick to send less arguments and access the data from other source.

I was thinking about using constant memory, but since constant memory is located on the global , I drop it. needless to say that the memory location should be fast.

Any help would be appreciated.

回答1:

Kernel arguments are passed in via constant memory (or shared memory in sm_1x), so there is no replication as you suggest.

c.f. the programming guide:

__global__ function parameters are passed to the device:

via shared memory and are limited to 256 bytes on devices of compute capability 1.x,

via constant memory and are limited to 4 KB on devices of compute capability 2.x and higher.

Of course, if you subsequently modify one of variable in your code then you're modifying a local copy (as per the C standard) and hence each thread will have its own copy, either in registers or, if needed, on the stack.

来源：https://stackoverflow.com/questions/14857371/cuda-how-to-pass-multiple-duplicated-arguments-to-cuda-kernel

标签

performance

cuda

gpu

gpgpu