Strassen Vinograd Algorithm

我与影子孤独终老i 提交于 2019-12-05 20:23:38

While it is asymptotically faster than a classical matrix-matrix product, even a very well written Strassen Vinograd algorithm won't be faster than a naïve implementation for small sizes (and on modern architectures 64x64 is very small). This is because there is considerable overhead of n x n matrix additions during the recursion.

That is why all optimized implementations have a cut-off size under which they will switch to an optimized classical matrix-matrix product. The cut-off size depends strongly on the architecture and the quality of the underlying algorithm, but it can be for n way above 512.

If you want to make a serious implementation, I recommend reading some detailed description on the algorithm (you can start at wikipedia, then continue with their references). If you are doing this as a toy-project, at least stop the recursion under a certain threshold (which you need to tune), and try bigger sizes until you find measurable performance gains.

Regarding memory allocation, since you are allowed to allocate memory in the main function, you can compute an upper bound of memory required for temporary matrices as T*N^2*(1/4+1/16+...) (where T is the number of temporaries per recursion depth and N is the size of the matrix) and re-use that memory at each recursion step. To simplify things, start with just one recursion step which immediately switches to a classical algorithm, until you manage to get any speed improvements -- and again, read existing literature about implementation details of the algorithm.

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!