Is there a maximum limit to private memory in OpenCL?

烂漫一生 提交于 2019-12-06 10:48:45

Its different for different archs. For example, a hd7870's private memory per compute-unit is 256kB and if your setting is 64 threads per compute unit, then each thread will have 4kB private memory which means 1000 float values. If you increase threads per compute unit further, privates/thread will drop to even 1kB range. You should add some local memory usage to balance it.

More importantly, you can not use all of it. Compiler uses big portion for its own optimizations and some things that I dont know. You can never be sure without a profiler.

As per OpenCL spec the location and size is not defined i.e. it left for vendor to decide. Which puts a question on How much is to be used. If used correctly gets the best performance and if not can be became the cause for slowdown.

You can use AMD's CodeXL or NVIDIA's Nsight (If you have AMD or NVIDIA cards) to analyze memory usage by the kernel. With little hands on tool you can understand the register spilling using these tool.

I don't think that the high usage of private memory will lead to the junk result, it could certainly be a issue in your code.

There isn't a theoretical limit for private memory (unlike local memory). If there was, clGetDeviceInfo would list it (it doesn't). However, I know there are practical limits. For example, some GPU implementations will try and store private memory in the register file if it fits. If you exceed this, it spills out to main memory and may be orders of magnitude more expensive. Regardless, the result should be correct (just achieved much slower). It should not junk your computation.

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!