Assume we have M x N array A. I would like to modify it by running a kernel with M independent GPU threads, each executin
M
N
A