What is the algorithm to determine optimal work group size and number of workgroup
OpenCL standard defines the following options to get info about device and compiled kernel: CL_DEVICE_MAX_COMPUTE_UNITS CL_DEVICE_MAX_WORK_GROUP_SIZE CL_KERNEL_WORK_GROUP_SIZE CL_KERNEL_PREFERRED_WORK_GROUP_SIZE_MULTIPLE Given this values, how can I calculate the optimal size of work group and number of work groups? You discover these values experimentally for your algorithm. Use a profiler to get hard numbers. I like to use CL_DEVICE_MAX_COMPUTE_UNITS as the number of work groups, because I often rely on synchronizing work items. I usually run kernels with little branching, so the take the