Cuda gridDim and blockDim

前端 未结 4 1312
星月不相逢
星月不相逢 2020-12-22 17:58

I get what blockDim is, but I have a problem with gridDim. Blockdim gives the size of the block, but what is gridDim? On the Internet

4条回答
  •  星月不相逢
    2020-12-22 18:35

    In this source code, we even have 4 threds, the kernel function can access all of 10 arrays. How?

    #define N 10 //(33*1024)
    
    __global__ void add(int *c){
        int tid = threadIdx.x + blockIdx.x * gridDim.x;
    
        if(tid < N)
            c[tid] = 1;
    
        while( tid < N)
        {
            c[tid] = 1;
            tid += blockDim.x * gridDim.x;
        }
    }
    
    int main(void)
    {
        int c[N];
        int *dev_c;
        cudaMalloc( (void**)&dev_c, N*sizeof(int) );
    
        for(int i=0; i>>(dev_c);
        cudaMemcpy(c, dev_c, N*sizeof(int), cudaMemcpyDeviceToHost );
    
        for(int i=0; i< N; ++i)
        {
            printf("c[%d] = %d \n" ,i, c[i] );
        }
    
        cudaFree( dev_c );
    }
    

    Why we do not create 10 threads ex) add<<<2,5>>> or add<5,2>>> Because we have to create reasonably small number of threads, if N is larger than 10 ex) 33*1024.

    This source code is example of this case. arrays are 10, cuda threads are 4. How to access all 10 arrays only by 4 threads.

    see the page about meaning of threadIdx, blockIdx, blockDim, gridDim in the cuda detail.

    In this source code,

    gridDim.x : 2    this means number of block of x
    
    gridDim.y : 1    this means number of block of y
    
    blockDim.x : 2   this means number of thread of x in a block
    
    blockDim.y : 1   this means number of thread of y in a block
    

    Our number of thread are 4, because 2*2(blocks * thread).

    In add kernel function, we can access 0, 1, 2, 3 index of thread

    ->tid = threadIdx.x + blockIdx.x * blockDim.x

    ①0+0*2=0

    ②1+0*2=1

    ③0+1*2=2

    ④1+1*2=3

    How to access rest of index 4, 5, 6, 7, 8, 9. There is a calculation in while loop

    tid += blockDim.x + gridDim.x in while
    

    ** first call of kernel **

    -1 loop: 0+2*2=4

    -2 loop: 4+2*2=8

    -3 loop: 8+2*2=12 ( but this value is false, while out!)

    ** second call of kernel **

    -1 loop: 1+2*2=5

    -2 loop: 5+2*2=9

    -3 loop: 9+2*2=13 ( but this value is false, while out!)

    ** third call of kernel **

    -1 loop: 2+2*2=6

    -2 loop: 6+2*2=10 ( but this value is false, while out!)

    ** fourth call of kernel **

    -1 loop: 3+2*2=7

    -2 loop: 7+2*2=11 ( but this value is false, while out!)

    So, all index of 0, 1, 2, 3, 4, 5, 6, 7, 8, 9 can access by tid value.

    refer to this page. http://study.marearts.com/2015/03/to-process-all-arrays-by-reasonably.html I cannot upload image, because low reputation.

提交回复
热议问题