How do I pass a 2-dimensional array into a kernel in pycuda?

耗尽温柔 提交于 2019-12-05 04:17:28

问题


I found an answer here, but it is not clear if I should reshape the array. Do I need to reshape the 2d array into 1d before passing it to pycuda kernel?


回答1:


There is no need to reshape a 2D gpuarray in order to pass it to a CUDA kernel.

As I said in the answer you linked to, a 2D numpy or PyCUDA array is just an allocation of pitched linear memory, stored in row major order by default. Both have two members which tell you everything that you need to access an array - shape and strides. For example:

In [8]: X=np.arange(0,15).reshape((5,3))

In [9]: print X.shape
(5, 3)

In [10]: print X.strides
(12, 4)

The shape is self explanatory, the stride is the pitch of the storage in bytes. The best practice for kernel code will be to treat the pointer supplied by PyCUDA as if it were allocated using cudaMallocPitch and treat the first element of stride as the byte pitch of the rows in memory. A trivial example might look like this:

import pycuda.driver as drv
from pycuda.compiler import SourceModule
import pycuda.autoinit
import numpy as np

mod = SourceModule("""
__global__ void diag_kernel(float *dest, int stride, int N)
{
    const int tid = threadIdx.x + blockDim.x * blockIdx.x;

    if (tid < N) {
    float* p = (float*)((char*)dest + tid*stride) + tid;
        *p = 1.0f;
    }
}
""")

diag_kernel = mod.get_function("diag_kernel")

a = np.zeros((10,10), dtype=np.float32)
a_N = np.int32(a.shape[0])
a_stride = np.int32(a.strides[0])
a_bytes = a.size * a.dtype.itemsize
a_gpu = drv.mem_alloc(a_bytes)
drv.memcpy_htod(a_gpu, a)
diag_kernel(a_gpu, a_stride, a_N, block=(32,1,1))
drv.memcpy_dtoh(a, a_gpu)

print a

Here some memory is allocated on the device, a zeroed 2D array is copied to that allocation directly, and the result of the kernel (filling the diagonals with 1) copied back to the host and printed. It isn't necessary to flatten or otherwise modify the shape or memory layout of the 2D numpy data at any point in the process. The result is:

$ cuda-memcheck python ./gpuarray.py 
========= CUDA-MEMCHECK
[[ 1.  0.  0.  0.  0.  0.  0.  0.  0.  0.]
 [ 0.  1.  0.  0.  0.  0.  0.  0.  0.  0.]
 [ 0.  0.  1.  0.  0.  0.  0.  0.  0.  0.]
 [ 0.  0.  0.  1.  0.  0.  0.  0.  0.  0.]
 [ 0.  0.  0.  0.  1.  0.  0.  0.  0.  0.]
 [ 0.  0.  0.  0.  0.  1.  0.  0.  0.  0.]
 [ 0.  0.  0.  0.  0.  0.  1.  0.  0.  0.]
 [ 0.  0.  0.  0.  0.  0.  0.  1.  0.  0.]
 [ 0.  0.  0.  0.  0.  0.  0.  0.  1.  0.]
 [ 0.  0.  0.  0.  0.  0.  0.  0.  0.  1.]]
========= ERROR SUMMARY: 0 errors


来源:https://stackoverflow.com/questions/19850836/how-do-i-pass-a-2-dimensional-array-into-a-kernel-in-pycuda

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!