Suppose a1
, b1
, c1
, and d1
point to heap memory and my numerical code has the following core loop.
const i
It's not because of a different code, but because of caching: RAM is slower than the CPU registers and a cache memory is inside the CPU to avoid to write the RAM every time a variable is changing. But the cache is not big as the RAM is, hence, it maps only a fraction of it.
The first code modifies distant memory addresses alternating them at each loop, thus requiring continuously to invalidate the cache.
The second code don't alternate: it just flow on adjacent addresses twice. This makes all the job to be completed in the cache, invalidating it only after the second loop starts.