Cuda Shared Memory array variable

前端 未结 3 1431
一整个雨季
一整个雨季 2020-12-13 07:56

I am trying to declare a variable for matrix multiplication as follows:

__shared__ float As[BLOCK_SIZE][BLOCK_SIZE];

I am trying to make it

3条回答
  •  抹茶落季
    2020-12-13 08:21

    extern __shared__ int buf[];

    when you launch the kernel you should launch it this way;

    kernel<<>>(...);

    If you have multiple extern declaration of shared:

    extern __shared__ float As[];

    extern __shared__ float Bs[];

    this will lead to As pointing to the same address as Bs.

    You will need to keep As and Bs inside the 1D-array.

    extern __shared__ float smem[];
    

    When calling kernel, you should launch it with 2*BLOCK_SIZE*BLOCK_SIZE*sizeof(float).

    When indexing into As, use smem[y*BLOCK_SIZE+x] and when indexing into Bs use smem[BLOCK_SIZE*BLOCK_SIZE+y*BLOCK_SIZE+x]

提交回复
热议问题