Process strings form OpenCL kernel

喜欢而已 提交于 2021-01-29 07:22:46

问题


There are several strings like

std::string first, second, third; ...

My plan was to collect their addresses into a char* array:

char *addresses = {&first[0], &second[0], &third[0]} ...

and pass the char **addresses to the OpenCL kernel.

There are several problems or questions:

The main issue is that I cannot pass array of pointers.

Is there any good way to use many-many strings from the kernel code without copying them but leave them in the shared memory?

I'm using NVIDIA on Windows. So, I can use only OpenCL 1.2 version.

I cannot concatenate the string because those are from different structure...

EDIT:

According to the first answer, if I have this (example):

char *p;

cl_mem cmHostString = clCreateBuffer(myDev.getcxGPUContext(), CL_MEM_ALLOC_HOST_PTR, BUFFER_SIZE, NULL, &oclErr);

oclErr = clEnqueueWriteBuffer(myDev.getCqCommandQueue(), cmHostString, CL_TRUE, 0, BUFFER_SIZE, p, 0, NULL, NULL);

Do I need copy the each element of my char array from host memory to other part of the host memory (and the new address is hidden from the host)?? It is not logical me. Why cannot I use the same address? I could directly access the host memory from the GPU device and use it.


回答1:


Is there any good way to use many-many strings from the kernel code without copying them but leave them in the shared memory?

Not in OpenCL1.2. Shared Virtual Memory concept is available since OpenCL 2.0 which isn't supported by NVidia as yet. You will need to either switch to GPU that supports OpenCL 2.0 or for OpenCL 1.2 copy your strings into continuous array of characters and pass them (copy) to the kernel.


EDIT: Responding to your edit - you can use:

  • CL_MEM_ALLOC_HOST_PTR flag to create empty buffer of required size and then map that buffer using clEnqueueMapBuffer and fill it using the pointer returned from mapping. After that unmap the buffer using clEnqueueUnmapMemObject.
  • CL_MEM_USE_HOST_PTR flag to create buffer of required size and pass there pointer to your array of characters.

From my experience buffer created using CL_MEM_USE_HOST_PTR flag is usually slightly faster, I think whether data is really copied or not under the hood depends on the implementation. But to use that you need to have your array of characters first prepared on the host.

You basically need to benchmark and see what is faster. Also don't concentrate too much on data copying, these are usually tiny numbers (transfers in GB/sec) in compare to how long it takes to run the kernel (depends of course what's in the kernel).



来源:https://stackoverflow.com/questions/31759801/process-strings-form-opencl-kernel

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!