Is CL_DEVICE_LOCAL_MEM_SIZE for the entire device, or per work-group?

橙三吉。 提交于 2019-12-10 17:57:58

问题


I'm not quite clear of the actual meaning of CL_DEVICE_LOCAL_MEM_SIZE, which is acquired through clGetDeviceInfo function. Is this value indicating the total sum of all the available local memory on a certain device, or the up-limit of local memory share to a work-group?


回答1:


TL;DR: Per single processing unit, hence also the maximum allotable to a work unit.

This value is the amount of local memory available on each compute unit in the device. Since a work-group is assigned to a single compute unit, this is also the maximum amount of local memory that any work-group can have.

For performance reasons on many GPUs, it is usually desirable to have multiple work-groups running on each compute unit concurrently (to hide memory access latency, for example). If one work-group uses all of the available local memory, the device will not be able to schedule any other work-groups onto the same compute unit until it has finished. If possible, it is recommended to limit the amount of local memory each work-group uses (to e.g. a quarter of the total local memory) to allow multiple work-groups to run on the same compute unit concurrently.



来源:https://stackoverflow.com/questions/31197564/is-cl-device-local-mem-size-for-the-entire-device-or-per-work-group

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!