Passing arguments through __local memory in OpenCL

青春壹個敷衍的年華 提交于 2019-12-11 04:02:28

问题


I am confused about the the __local memory in OpenCL here. I read some spec saying that the data flow has to be from Host to __Global, and then __Local. But I also see some kernel function like this:

__kernel void foo(__local float * a)

I was wondering how the data was transferred directly into the __local memory in this way?

Thanks.


回答1:


It is not possible to fill local buffer on the host side. Therefore you have to follow the flow host -> __global -> __local.

Local buffer can be either created on the host side and then it is passed as a kernel parameter or on gpu side inside the kernel. Creating local buffer on the host side gives the advantage to decide about its size before the kernel is run which can be important if the local buffer size needs to be different each time the kernel is run.




回答2:


Local memory is not visible to anything but a single work-group, and may be allocated as the work-group is dispatched by hardware on many architectures. Hardware that can mix multiple work-groups from different kernels on each CU will allow the scheduling component to chunk up the local memory for each of the groups being issued. It doesn't exist before the group is launched, and does not exist after the group terminates. The size of this region is what you pass in as other answers have pointed out.

The result of this is that the only way on many architectures for filling local memory from the host would be for kernel code to be inserted by the compiler that would copy data in from global memory. Given that as the basis, it isn't any worse in terms of performance for the programmer to do it manually, and gives more control over exactly what happens. You do not end up in a situation where the compiler always generates copy code and ends up copying more than was really necessary because the API didn't make it clear what memory was copy-in and what was not.

In summary, you cannot fill local memory in any automated way. In practice you will rarely want to, because doing it manually gives you the opportunity to only put the result of a first stage into local, removing extra copy operations, or to transform the data on the way in to local, allowing padding or data transposition to remove bank conflicts and so on.




回答3:


As @doqtor said, the size of local memory on kernel parameter can be specified by clSetKernelArg calls.

Fortunately, OpenCL 1.2+ support VLA(variable length array), local memory kernel parameter is not required any more.



来源:https://stackoverflow.com/questions/30249801/passing-arguments-through-local-memory-in-opencl

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!