Why are elementwise additions much faster in separate loops than in a combined loop?

后端 未结 10 706
旧巷少年郎
旧巷少年郎 2020-11-22 09:49

Suppose a1, b1, c1, and d1 point to heap memory and my numerical code has the following core loop.

const i         


        
10条回答
  •  野性不改
    2020-11-22 10:04

    It's not because of a different code, but because of caching: RAM is slower than the CPU registers and a cache memory is inside the CPU to avoid to write the RAM every time a variable is changing. But the cache is not big as the RAM is, hence, it maps only a fraction of it.

    The first code modifies distant memory addresses alternating them at each loop, thus requiring continuously to invalidate the cache.

    The second code don't alternate: it just flow on adjacent addresses twice. This makes all the job to be completed in the cache, invalidating it only after the second loop starts.

提交回复
热议问题