OpenCL standard defines the following options to get info about device and compiled kernel:
CL_DEVICE_MAX_COMPUTE_UNITS
CL_DEVICE_MAX_WORK_G
As mfa said, you have to discover these experimentally. I wanted to add that depending on what you are computing (particularly size of the jobs, i.e. smaller or larger for each work item), sometimes a good try can be:
That is, basically check base cases and figure out how it affects the processing pipeline.
In essence you have to tweak it. I often execute several times for different parameters (profile it) and then generate a surface plot to see how it behaves.