I believe my CUDA application could potentially benefit from shared memory, in order to keep the data near the GPU cores. Right now, I have a single kernel to which I pass a poi