How to measure cpu time and wall clock time?

拈花ヽ惹草 提交于 2019-12-03 20:45:45

According to my manual page on clock it says

POSIX requires that CLOCKS_PER_SEC equals 1000000 independent of the actual resolution.

When increasing the number iterations on my computer the measured cpu-time starts showing on 100000 iterations. From the returned figures it seems the resolution is actually 10 millisecond.

Beware that when you optimize your code, the whole loop may disappear because sum is a dead value. There is also nothing to stop the compiler from moving the clock statements across the loop as there are no real dependences with the code in between.

Let me elaborate a bit more on micro measurements of performance of code. The naive and tempting way to measure performance is indeed by adding clock statements as you have done. However since time is not a concept or side effect in C, compilers can often move these clock calls at will. To remedy this it is tempting to make such clock calls have side effects by for example having it access volatile variables. However this still doesn't prohibit the compiler from moving highly side-effect free code over the calls. Think for example of accessing regular local variables. But worse, by making the clock calls look very scary to the compiler, you will actually negatively impact any optimizations. As a result, mere measuring of the performance impacts that performance in a negative and undesirable way.

If you use profiling, as already mentioned by someone, you can get a pretty good assessment of the performance of even optimized code, although the overall time of course is increased.

Another good way to measure performance is just asking the compiler to report the number of cycles some code will take. For a lot of architectures the compiler has a very accurate estimate of this. However most notably for a Pentium architecture it doesn't because the hardware does a lot of scheduling that is hard to predict.

Although it is not standing practice I think compilers should support a pragma that marks a function to be measured. The compiler then can include high precision non-intrusive measuring points in the prologue and epilogue of a function and prohibit any inlining of the function. Depending on the architecture it can choose a high precision clock to measure time, preferably with support from the OS to only measure time of the current process.

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!