I have always had the idea that reducing the number of iterations is the way to making programs more efficient. Since I never really confirmed that, I set out to te
When trying to benchmark code, you need to:
You didn't do both. You could use -O3 for example, and as for the average, I did this (I made the function return an element from a list):
for(int i = 0; i < 100; ++i)
dummy = myFunc1();
Then, I got an output like this:
Time taken by func1 (micro s):206693
Time taken by func2 (micro s):37898
That confirms what you saw, but the difference is an order of magnitude (which is a very big deal).
In single for-loop, you do the housekeeping once and the counter of the loop is incremented once. In several for-loops, this is expanded (and you need to do it as many times as the for-loops you have). When the body of the loop is a bit trivial, like in your case, then it can make a difference.
Another issue is data locality. The second function has loops that will populate one list at a time (meaning that the memory will be accessed in a contiguous fashion). In your big loop in the first function, you will fill one element of a list a time, which boils down to random access of memory (since when list1 for example will be brought into the cache, because you filled an element of it, then in the next line of your code, you will request list2, meaning that list1 is useless now. However, in the second function, once you bring list1 in the cache, you will continue using it from the cache (rather than having to fetch it from memory), which results in major speedup).
I believe that this fact dominates over the other (big loop VS several small ones) here. So, you are not actually benchmarking what you wanted to, but rather random memory access VS contiguous memory access.