Why is there huge performance hit in 2048x2048 versus 2047x2047 array multiplication?

前端 未结 10 1515
不思量自难忘°
不思量自难忘° 2020-11-29 17:06

I am making some matrix multiplication benchmarking, as previously mentioned in Why is MATLAB so fast in matrix multiplication?

Now I\'ve got another issue, when mu

10条回答
  •  醉话见心
    2020-11-29 17:47

    Probably a caching effect. With matrix dimensions that are large powers of two, and a cache size that is also a power of two, you can end up only using a small fraction of your L1 cache, slowing things down a lot. Naive matrix multiplication is usually constrained by the need to fetch data into the cache. Optimized algorithms using tiling (or cache-oblivious algorithms) focus on making better use of L1 cache.

    If you time other pairs (2^n-1,2^n) I expect you'll see similar effects.

    To explain more fully, in the inner loop, where you access matice2[m,k], it's likely that matice2[m,k] and matice2[m+1,k] are offset from each other by 2048*sizeof(float) and thus map to the same index in the L1 cache. With an N-way associative cache you will have typically have 1-8 cache locations for all of these. Thus almost all of those accesses will trigger an L1 cache eviction, and fetching of data from a slower cache or main memory.

提交回复
热议问题