If I am experimenting with shared memory in CUDA and I do not understand its behaviour in this bit of code. I have a pretty basic kernel:
__global__ void sum
If you use the extern qualifier you need to pass the size of the shared memory when launching the kernel. kernel<<< blocks, threads, size>>>(...) The size parameter is the size of shared memory in Bytes.
kernel<<< blocks, threads, size>>>(...)