How to pass and access C++ vectors to OpenCL kernel?

落爺英雄遲暮 提交于 2019-11-30 05:06:31
  1. You have to allocate an OpenCL buffer and copy your CPU data into it. An OpenCL buffer has a fixed size, so you either have to recreate it if your data size changes or you make it "big enough" and use only a subsection of it if less memory is needed. For example, to create a buffer for b and at the same time copy all of its data to the device:

    cl_mem buffer_b = clCreateBuffer(
        context, // OpenCL context
        CL_MEM_READ_ONLY | CL_MEM_COPY_HOST_PTR, // Only read access from kernel,
                                                 // copy data from host
        sizeof(cl_double) * b.size(), // Buffer size in bytes
        &b[0], // Pointer to data to copy
        &errorcode); // Return code
    

    It is also possible to directly map host memory (CL_MEM_USE_HOST_PTR), but this imposes some restrictions on the alignment and the access to the host memory after creating the buffer. Basically, the host memory can contain garbage when you are not currently mapping it.

  2. It depends. Are the sizes of the vectors in the second dimension consistenly equal? Then just flatten them when uploading them to the OpenCL device. Otherwise it gets more complicated.

  3. You declare buffer arguments as __global pointers in your kernel. For example, __global double *b would be appropiate for the buffer created in 1. You can simply use array notation in the kernel to access the individual elements in the buffer.

  4. You cannot query the buffer size from within the kernel, so you have to pass it manually. This can also happen implicitly, e.g. if the number of work items matches the size of b.

A kernel which can access all of the data for the computation could look like this:

__kernel void foo(long x, double y, double a, __global double* b, int b_size,
                  __global long* c, __global double* d,
                  __global double* result) {
  // Here be dragons
  *result = 0.0;
}

Note that you also have to allocate memory for the result. It might be necessary to pass additional size arguments should you need them. You would call the kernel as follows:

// Create/fill buffers
// ...

// Set arguments
clSetKernelArg(kernel, 0, sizeof(cl_long), &x);
clSetKernelArg(kernel, 1, sizeof(cl_double), &y);
clSetKernelArg(kernel, 2, sizeof(cl_double), &a);
clSetKernelArg(kernel, 3, sizeof(cl_mem), &b_buffer);
cl_int b_size = b.size();
clSetKernelArg(kernel, 4, sizeof(cl_int), &b_size);
clSetKernelArg(kernel, 5, sizeof(cl_mem), &c_buffer);
clSetKernelArg(kernel, 6, sizeof(cl_mem), &d_buffer);
clSetKernelArg(kernel, 7, sizeof(cl_mem), &result_buffer);
// Enqueue kernel
clEnqueueNDRangeKernel(queue, kernel, /* ... depends on your domain */);

// Read back result
cl_double result;
clEnqueueReadBuffer(queue, result_buffer, CL_TRUE, 0, sizeof(cl_double), &result,
                    0, NULL, NULL);
标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!