Why is a naïve C++ matrix multiplication 100 times slower than BLAS?

后端 未结 5 1379
我寻月下人不归
我寻月下人不归 2020-12-25 09:17

I am taking a look at large matrix multiplication and ran the following experiment to form a baseline test:

  1. Randomly generate two 4096x4096 matrixes X, Y from
5条回答
  •  萌比男神i
    2020-12-25 09:41

    Strassen's algorithm has two advantages over the naïve algorithm:

    1. Better time complexity in terms of number of operations, as other answers correctly point out;
    2. It is a cache-oblivious algorithm. The difference in number of cache misses is in the order of B*M½, where B is the cache line size and M is the cache size.

    I think that the second point accounts for a lot for the slowdown you are experiencing. If you are running your application under Linux, I suggest you run them with the perf tool, which tells you how many cache misses the program is experiencing.

提交回复
热议问题