发表新帖

发表新帖

Why is there huge performance hit in 2048x2048 versus 2047x2047 array multiplication?

前端未结

关注

 10  1515

不思量自难忘° 2020-11-29 17:06

I am making some matrix multiplication benchmarking, as previously mentioned in Why is MATLAB so fast in matrix multiplication?

Now I\'ve got another issue, when mu

10条回答

醉话见心 (楼主)

2020-11-29 17:47

Probably a caching effect. With matrix dimensions that are large powers of two, and a cache size that is also a power of two, you can end up only using a small fraction of your L1 cache, slowing things down a lot. Naive matrix multiplication is usually constrained by the need to fetch data into the cache. Optimized algorithms using tiling (or cache-oblivious algorithms) focus on making better use of L1 cache.

If you time other pairs (2^n-1,2^n) I expect you'll see similar effects.

To explain more fully, in the inner loop, where you access matice2[m,k], it's likely that matice2[m,k] and matice2[m+1,k] are offset from each other by 2048*sizeof(float) and thus map to the same index in the L1 cache. With an N-way associative cache you will have typically have 1-8 cache locations for all of these. Thus almost all of those accesses will trigger an L1 cache eviction, and fetching of data from a slower cache or main memory.

0 讨论(0)

查看其它10个回答
发布评论:

提交评论
- 加载中...

热议问题