Is this a correct way of timing kernel execution time for OpenCL? I am quite keen on using the c++ wrapper (which unfortunately does not have many examples of timings).
I think your approach should work just fine (is it not?). Alternately, if you want to time each call, you can pass an event to enqueueNDRangeKernel
and call getProfilingInfo
on that enqueueNDRangeKernel
.
cl::Event evt;
err = queue.enqueueNDRangeKernel(kernel, cl::NullRange, cl::NDRange(512), cl::NullRange, NULL, &evt);
evt.wait();
elapsed += evt.getProfilingInfo<CL_PROFILING_COMMAND_END>() -
evt.getProfilingInfo<CL_PROFILING_COMMAND_START>();