Coalesced global memory writes using hash
问题 My question concerns the coalesced global writes to a dynamically changing set of elements of an array in CUDA. Consider the following kernel: __global__ void kernel (int n, int *odata, int *idata, int *hash) { int i = blockIdx.x * blockDim.x + threadIdx.x; if (i < n) odata[hash[i]] = idata[i]; } Here the first n elements of the array hash contain the indices of odata to be updated from the first n elements of idata . Obviously this leads to a terrible, terrible lack of coalescence. In the