I have the following code that writes a global array with zeros twice, once forward and once backward.
#include
#include
#inc
Following asimes answer that it's due to caching - i'm not convinced that you can enjoy the benefit of caches with a ~100M array, you're likely to completely thrash out any useful data before returning there.
However, depending on your platform (OS mostly), there are other mechanisms as work - when you allocate the arrays you never initialize them, so the first loop probably incurs the penalty of the first access per each 4k page. This usually would cause some assist of a syscall that comes with a high overhead.
In this case you also modify the page, so most system would be forced to perform a copy-on-write flow (an optimization that works as long as you read only from a page), this is even heavier.
Adding a small access per page (which should be negligible with regards to actual caching and it only fetches one 64B line out of each 4k page), managed to make the results more even on my system (although this form of measurement isn't very accurate to begin with)
#include
#include
#include
#define SIZE 100000000
char c[SIZE];
char c2[SIZE];
int main()
{
int i;
for(i = 0; i < SIZE; i+=4096) //// access and modify each page once
c[i] = 0; ////
clock_t t = clock();
for(i = 0; i < SIZE; i++)
c[i] = 0;
t = clock() - t;
printf("%d\n\n", t);
t = clock();
for(i = SIZE - 1; i >= 0; i--)
c[i] = 0;
t = clock() - t;
printf("%d\n\n", t);
}