问题
How can I create an array in shared memory without modifying the kernel using templates as seen in the official examples. Or is using templates the official way?
In PyOpenCL I can create an array in local memory with setting a kernel argument
kernel.set_arg(1,numpy.uint32(a_width))
...
KERNEL_CODE = """
__kernel void matrixMul(__local float* A_temp,...)
{ ...} """
回答1:
CUDA supports dynamic shared memory allocation at kernel run time, but the mechanism is a bit different to OpenCL. In the CUDA runtime API, a kernel using dynamically allocated/sized shared memory and the launch to size the memory uses the following syntax:
__global__ void kernel(...)
{
extern __shared__ typename buffer[];
....
}
....
kernel <<< griddim, blockdim, sharedmem, streamID >>> (...)
where sharedmem
is the total number of bytes per block which will be allocated to buffer.
In PyCUDA, the same mechanism works something like this:
mod = SourceModule("""
__global__ void kernel(...)
{
extern __shared__ typename buffer[];
....
}
""")
func = mod.get_function("kernel")
func.prepare(..., shared=sharedmem)
func.prepared_call(griddim,blockdim,...)
with the shared memory allocation size passed to the prepare
method.
回答2:
I do not understand the question fully. I do not work with Python, but know OpenCL quite well.
In OpenCL you have two possibilities to create shared/local memory buffers:
1) You add a kernel parameter as you have it in you question. 2) Do define a buffer statically within the kernel itself like:
__local buffer[1024];
There are no other chances to do this with OpenCL. How you create the kernel code string to pass it to OpenCL is another question and related to Python. I am not an expert on this.
来源:https://stackoverflow.com/questions/6468132/create-arrays-in-shared-memory-w-o-templates-like-in-pyopencl