Does the OpenCL specification set any maximum limit on the amount of private memory that can be used? If so, how do I get this number?
I have a function which gives the correct result when run outside OpenCL, but when converted to a kernel, it spews out garbage. I checked the amount of private memory being used per work item using the CL_KERNEL_PRIVATE_MEM_SIZE flag and it is ~ 4000 bytes. I suspect that I am using too much private memory and this is somehow leading to junk computation.
Its different for different archs. For example, a hd7870's private memory per compute-unit is 256kB and if your setting is 64 threads per compute unit, then each thread will have 4kB private memory which means 1000 float values. If you increase threads per compute unit further, privates/thread will drop to even 1kB range. You should add some local memory usage to balance it.
More importantly, you can not use all of it. Compiler uses big portion for its own optimizations and some things that I dont know. You can never be sure without a profiler.
As per OpenCL spec the location and size is not defined i.e. it left for vendor to decide. Which puts a question on How much is to be used. If used correctly gets the best performance and if not can be became the cause for slowdown.
You can use AMD's CodeXL or NVIDIA's Nsight (If you have AMD or NVIDIA cards) to analyze memory usage by the kernel. With little hands on tool you can understand the register spilling using these tool.
I don't think that the high usage of private memory will lead to the junk result, it could certainly be a issue in your code.
There isn't a theoretical limit for private memory (unlike local memory). If there was, clGetDeviceInfo would list it (it doesn't). However, I know there are practical limits. For example, some GPU implementations will try and store private memory in the register file if it fits. If you exceed this, it spills out to main memory and may be orders of magnitude more expensive. Regardless, the result should be correct (just achieved much slower). It should not junk your computation.
来源:https://stackoverflow.com/questions/22083507/is-there-a-maximum-limit-to-private-memory-in-opencl