I tried to implement the Strassen algorithm for matrix multiplication with C++, but the result isn\'t that, what I expected. As you can see strassen always takes more time then
The big O of Strassen is O(N ^ log 7) compared to O(N ^ 3) regular, i.e. log 7 base 2 which is slightly less than 3.
That is the number of multiplications you need to make.
It assumes there is no cost to anything else you have, and also should be "faster" only as N gets large enough which yours probably does not.
Much of your implementation is creating lots of sub-matrices and my guess is the way you are storing them you are having to allocate memory and copy every time you do this. Having some kind of "slice" matrix and logical-transpose matrix if you can would help you optimise what is probably the slowest part of your process.