how to profile sequential launched multiple OpenCL kernels by one clFinish?

≡放荡痞女 提交于 2020-01-12 08:36:06

问题


I have multiple kernels, and they are launched in sequential manner like this:

        clEnqueueNDRangeKernel(..., kernel1, ...);
        clEnqueueNDRangeKernel(..., kernel2, ...);
        clEnqueueNDRangeKernel(..., kernel3, ...);

and, multiple kernels share one global buffer.

Now, I profile every kernel execution and sum them up to count total execution time by adding the code block after clEnqueueNDRangeKernel:

        clFinish(cmdQueue);
        status = clGetEventProfilingInfo(...,&starttime,...);
        clGetEventProfilingInfo(...,&endtime,...);
        time_spent = endtime - starttime;

My questions is that how to profile three kernels all together by one clFinish? (like adding one clFinish() after the last kernel launching).

Yes, I give every clEnqueueNDRangeKernel different time event, and get large Negative number. The detail information:

clEnqueueNDRangeKernel(cmdQueue,...,&timing_event1);
clFinish(cmdQueue);
clGetEventProfilingInfo(timing_event1,CL_PROFILING_COMMAND_START,sizeof(cl_ulong),&starttime1,NULL);
clGetEventProfilingInfo(timing_event1,CL_PROFILING_COMMAND_END,sizeof(cl_ulong),&endtime1,NULL);
time_spent1 = endtime1 - starttime1;

clEnqueueNDRangeKernel(cmdQueue,...,&timing_event2);
clFinish(cmdQueue);
clGetEventProfilingInfo(timing_event2,CL_PROFILING_COMMAND_START,sizeof(cl_ulong),&starttime2,NULL);
clGetEventProfilingInfo(timing_event2,CL_PROFILING_COMMAND_END,sizeof(cl_ulong),&endtime2,NULL);
time_spent2 = endtime2 - starttime2;

clEnqueueNDRangeKernel(cmdQueue,...,&timing_event3);
clFinish(cmdQueue);
clGetEventProfilingInfo(timing_event3,CL_PROFILING_COMMAND_START,sizeof(cl_ulong),&starttime3,NULL);
clGetEventProfilingInfo(timing_event3,CL_PROFILING_COMMAND_END,sizeof(cl_ulong),&endtime3,NULL);
time_spent3 = endtime3 - starttime3;

time_spent_all_0 = time_spent1 + time_spent2 + time_spent3;
time_spent_all_1 = endtime3 - starttime1;

If I have every clFinish, all profiling values are reasonable, but time_spent_all_1 is about 2 times over time_spent_all_0. If I remove all clFinish except for the last clFinish, all profiling values are non reasonable.

Thanks to Eric Bainville that I have gotten the result I want: profiling multiple clEnqueueNDRangeKernel by one clFinish. The following is final code I use:

clEnqueueNDRangeKernel(cmdQueue,...,&timing_event1);
clEnqueueNDRangeKernel(cmdQueue,...,&timing_event2);
clEnqueueNDRangeKernel(cmdQueue,...,&timing_event3);
clFinish(cmdQueue);

clGetEventProfilingInfo(timing_event1,CL_PROFILING_COMMAND_START,sizeof(cl_ulong),&starttime,NULL);
clGetEventProfilingInfo(timing_event3,CL_PROFILING_COMMAND_END,sizeof(cl_ulong),&endtime,NULL);
time_spent = endtime - starttime;

回答1:


Each clEnqueueNDRangeKernel will create its own cl_event: the last arg of the call is a pointer to a cl_event; if this last arg is not 0, a new event will be created.

After a command has completed, the associated event can be queried the start/end profiling info. This event must be released after use (call clReleaseEvent).

clFinish blocks until all enqueued commands are completed.

You need only one call to clFinish, and then you can query profiling info for all events.



来源:https://stackoverflow.com/questions/11363780/how-to-profile-sequential-launched-multiple-opencl-kernels-by-one-clfinish

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!