I am trying to declare a variable for matrix multiplication as follows:
__shared__ float As[BLOCK_SIZE][BLOCK_SIZE];
I am trying to make it
extern __shared__ int buf[];
when you launch the kernel you should launch it this way;
kernel<<
If you have multiple extern declaration of shared:
extern __shared__ float As[];
extern __shared__ float Bs[];
this will lead to As
pointing to the same address as Bs
.
You will need to keep As and Bs inside the 1D-array.
extern __shared__ float smem[];
When calling kernel, you should launch it with 2*BLOCK_SIZE*BLOCK_SIZE*sizeof(float)
.
When indexing into As, use smem[y*BLOCK_SIZE+x]
and when indexing into Bs use smem[BLOCK_SIZE*BLOCK_SIZE+y*BLOCK_SIZE+x]