Hello I\'m trying to write a CUDA kernel to perform the following piece of code.
for (n = 0; n < (total-1); n++) { a = values[n]; for ( i = n+1; i &
I'll probably be way wrong but the n < (total-1) check in
n < (total-1)
for (int n = idx; n < (total-1); n += blockDim.x*gridDim.x)
seems different than the original version.