I have a GPU with CC 3.0, so it should support 16 concurrent kernels. I am starting 10 kernels by looping through clEnqueueNDRangeKernel for 10 times. How do I get to know t
Yes, as you suggest, try to use the events, and analyze all the QUEUED, SUBMIT, START, END values. These should be absolute values in "device time", and you may be able to see if processing (START to END) overlaps for the different kernels.