I tested parallel_for_
in OpenCV by comparing with the normal operation for just simple array summation and multiplication.
I have array of 100 integer
Let me do some considerations:
clock()
function is not accurate at all. Its tick is roughly 1 / CLOCKS_PER_SEC
but how often it's updated and if it's uniform or not it's system and implementation dependent. See this post for more details about that.
Better alternatives to measure time:
Measures are always affected by errors. Performance measurement for your code is affected (short list, there is much more than that) by other programs, cache, operating system jobs, scheduling and user activity. To have a better measure you have to repeat it many times (let's say 1000 or more) then calculate average. Moreover you should prepare your test environment to be as clean as possible.
More details about tests on these posts:
In your case overhead for parallel execution (and your test code structure) is much higher that loop body itself. In this case it's not productive to make an algorithm parallel. Parallel execution must always be evaluated in a specific scenario, measured and compared. It's not kind of magic medicine to speed up everything. Take a look to this article about How to Quantify Scalability.
Just for example if you have to sum/multiply 100 numbers it's better to use SIMD instructions (even better within an unrolled loop).
Try to make your loop body empty (or to execute a single NOP
operation or volatile
write so it won't be optimized away). You'll roughly measure overhead. Now compare it with your results.
IMO this kind of test is pretty useless. You can't compare, in a generic way, serial or parallel execution. It's something you should always check against a specific situation (in real world many things will play, synchronization for example).
Imagine: you make your loop body really "heavy" and you'll see a big speed up with parallel execution. Now you make your real program parallel and you see performance is worse. Why? Because parallel execution is slowed down by locks, by cache problems or serial access to a shared resource.
Test itself is meaningless unless you're testing your specific code in your specific situation (because too many factors will play and you just can't ignore them). What it means? Well that you can compare only what you tested...if your program performs total *= buffertoClip[i];
then your results are reliable. If your real program does something else then you have to repeat tests with that something else.