Cuda gridDim and blockDim

前端未结

关注

 4  1308

星月不相逢 2020-12-22 17:58

I get what blockDim is, but I have a problem with gridDim. Blockdim gives the size of the block, but what is gridDim? On the Internet

4条回答

离开以前 (楼主)

2020-12-22 18:40
Paraphrased from the CUDA Programming Guide:

gridDim: This variable contains the dimensions of the grid.

blockIdx: This variable contains the block index within the grid.

blockDim: This variable and contains the dimensions of the block.

threadIdx: This variable contains the thread index within the block.

You seem to be a bit confused about the thread hierachy that CUDA has; in a nutshell, for a kernel there will be 1 grid, (which I always visualize as a 3-dimensional cube). Each of its elements is a block, such that a grid declared as dim3 grid(10, 10, 2); would have 10*10*2 total blocks. In turn, each block is a 3-dimensional cube of threads.

With that said, it's common to only use the x-dimension of the blocks and grids, which is what it looks like the code in your question is doing. This is especially revlevant if you're working with 1D arrays. In that case, your tid+=blockDim.x * gridDim.x line would in effect be the unique index of each thread within your grid. This is because your blockDim.x would be the size of each block, and your gridDim.x would be the total number of blocks.

So if you launch a kernel with parameters
```
dim3 block_dim(128,1,1);
dim3 grid_dim(10,1,1);
kernel<<>>(...);
```
then in your kernel had threadIdx.x + blockIdx.x*blockDim.x you would effectively have:

threadIdx.x range from [0 ~ 128)

blockIdx.x range from [0 ~ 10)

blockDim.x equal to 128

gridDim.x equal to 10

Hence in calculating threadIdx.x + blockIdx.x*blockDim.x, you would have values within the range defined by: [0, 128) + 128 * [1, 10), which would mean your tid values would range from {0, 1, 2, ..., 1279}. This is useful for when you want to map threads to tasks, as this provides a unique identifier for all of your threads in your kernel.

However, if you have
```
int tid = threadIdx.x + blockIdx.x * blockDim.x;
tid += blockDim.x * gridDim.x;
```
then you'll essentially have: tid = [0, 128) + 128 * [1, 10) + (128 * 10), and your tid values would range from {1280, 1281, ..., 2559} I'm not sure where that would be relevant, but it all depends on your application and how you map your threads to your data. This mapping is pretty central to any kernel launch, and you're the one who determines how it should be done. When you launch your kernel you specify the grid and block dimensions, and you're the one who has to enforce the mapping to your data inside your kernel. As long as you don't exceed your hardware limits (for modern cards, you can have a maximum of 2^10 threads per block and 2^16 - 1 blocks per grid)
0 讨论(0)

查看其它4个回答
发布评论:

提交评论
- 加载中...