Is this a correct way of timing kernel execution time for OpenCL? I am quite keen on using the c++ wrapper (which unfortunately does not have many examples of timings).
I think your approach should work just fine (is it not?). Alternately, if you want to time each call, you can pass an event to enqueueNDRangeKernel and call getProfilingInfo on that enqueueNDRangeKernel.
cl::Event evt;
err = queue.enqueueNDRangeKernel(kernel, cl::NullRange, cl::NDRange(512), cl::NullRange, NULL, &evt);
evt.wait();
elapsed += evt.getProfilingInfo<CL_PROFILING_COMMAND_END>() -
evt.getProfilingInfo<CL_PROFILING_COMMAND_START>();