Shared memory and streams when launching kernel

对着背影说爱祢 提交于 2019-12-11 04:08:54

问题


I'm new to CUDA and working on a personal project. I know that, if you want to specify the amount of shared memory at launch:

kernel<<<grid_size,block_size,shared_mem_size>>>(parameters);

On the other hand, if I want to put a kernel into a stream:

kernel<<<grid_size,block_size,0,stream_being_used>>>(parameters);

I don't understand why the third parameter is 0 in the case of stream? (I'm getting it from chapter 10 in "CUDA by examples" by Sanders and Kandrot).

If I want to specify the shared memory at launch AND put it into a stream, how do I do that correctly? In other words, what should the parameters in between <<<...>>> look like?


回答1:


The only reason that 0 is there is because in that particular example, no dynamic shared memory is required.

Shared memory can be allocated either statically (without using extern in which case the size is explicitly stated in the declaration) or dynamically(using extern, and the size shows up as the 3rd parameter in the kernel launch configuration).

The kernel launch configuration parameters <<<...>>> always show up in the same order:

  1. the grid dimensions
  2. the threadblock dimensions
  3. the size of dynamically allocated shared memory (in bytes)
  4. the stream to launch the kernel in

1 and 2 are mandatory, 3 and 4 are optional. But if you need to specify parameter 4 (the stream) you must supply parameter 3, even if it is zero.

so the correct sequence is:

kernel_name<<<grid_dim, threadblock_dim, dynamic_shared_memory_size, stream>>>(...);

You can read more about it in the documentation



来源:https://stackoverflow.com/questions/27162408/shared-memory-and-streams-when-launching-kernel

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!