问题
I have a 2D matrix in the main. I want to transfer if from host to device. Can you tell me how I can allocate memory for it and transfer it to the device memory?
#define N 5
__global__ void kernel(int a[N][N]){
}
int main(void){
int a[N][N];
cudaMalloc(?);
cudaMemcpy(?);
kernel<<<N,N>>>(?);
}
回答1:
Perhaps something like this is what you really had in mind:
#define N 5
__global__ void kernel(int *a)
{
// Thread indexing within Grid - note these are
// in column major order.
int tidx = threadIdx.x + blockIdx.x * blockDim.x;
int tidy = threadIdx.y + blockIdx.y * blockDim.y;
// a_ij = a[i][j], where a is in row major order
int a_ij = a[tidy + tidx*N];
}
int main(void)
{
int a[N][N], *a_device;
const size_t a_size = sizeof(int) * size_t(N*N);
cudaMalloc((void **)&a_device, a_size);
cudaMemcpy(a_device, a, a_size, cudaMemcpyHostToDevice);
kernel<<<N,N>>>(a_device);
}
The point you might have missed is that when you statically declare an array like this A[N][N]
, it is really just a row major ordered piece of linear memory. The compiler is automatically converting between a[i][j]
and a[j + i*N]
when it emits code. On the GPU, you must use the second form of access to read the memory you copy from the host.
来源:https://stackoverflow.com/questions/9373929/cuda-transfer-2d-array-from-host-to-device