What is the best matrix multiplication algorithm? [closed]

拜拜、爱过 提交于 2019-12-17 16:07:54

问题


What is the best matrix multiplication algorithm? What means 'the best'for me? It means the fastest and ready for todays machines.

Please give links to pseudocode if you can.


回答1:


BLAS is the best ready-to-use efficient matrix multiplication library. There are many different implementation. Here is a benchmark I made for some implementations on a MacBook Pro with dual-core Intel Core 2 Duo 2.66 GHz :

  • gotoBLAS2 (open-source) : https://www.tacc.utexas.edu/research-development/tacc-software/gotoblas2
  • ATLAS (open-source) : http://math-atlas.sourceforge.net/
  • Accelerate.framework (Apple) : http://developer.apple.com/performance/accelerateframework.html
  • a non-optimized, but portable, implementation that I called 'vanilla' (from the GSL)

There are also other commercial implementations that I didn't test here :

  • MKL (Intel) : http://software.intel.com/en-us/articles/intel-mkl/
  • ACML (AMD) : http://developer.amd.com/cpu/Libraries/acml/Pages/default.aspx



回答2:


The best matrix multiplication algorithm is the one that someone with detailed architectural knowledge has already hand-tuned for your target platform.

There are lots of good libraries that supply tuned matrix-multiply implementations. Use one of them.




回答3:


There are probably better ones but these are the ones I've head of (better than the standard cubic complexity algorithm).

Strassen's - O(N^2.8)

Coppersmith Winograd - O(N^2.376)




回答4:


Why pseudocode? Why implement it yourself? If speed is your concern, there are highly optimized algorithms available that include optimizations for specific instruction sets (e.g. SIMD), implementing those all by yourself offers no real benefit (apart from maybe learning),

Take a look at different BLAS implementations, like:

http://www.netlib.org/blas/

http://math-atlas.sourceforge.net/




回答5:


Here is algorithms course of MIT and the matrix multiplication lecture

http://ocw.mit.edu/courses/electrical-engineering-and-computer-science/6-046j-introduction-to-algorithms-sma-5503-fall-2005/video-lectures/lecture-19-shortest-paths-iii-all-pairs-shortest-paths-matrix-multiplication-floyd-warshall-johnson/

matrix multiplication - O(n^3)

Strassen’s algorithm - O(n^2.8) http://en.wikipedia.org/wiki/Strassen_algorithm

Coppersmith–Winograd - O(n^2.376) http://en.wikipedia.org/wiki/Coppersmith%E2%80%93Winograd_algorithm




回答6:


Depends on the size of the matrix, and whether it's sparse or not.

For small-to-medium-sized dense matrices, I believe that some variation on the "naive" O(N^3) algorithm is a win, if you pay attention to cache-coherence and use the platform's vector instructions.

Data arrangement is important -- for cases where your standard matrix layout is cache-unfriendly (e.g., column-major * row-major), you should try binary decomposition of your matrix multiplication -- even if you don't use Strassen's or other "fast" algorithms, this order of operations can yield a "cache-oblivious" algorithm that automatically makes good use of every level of cache. If you have the luxury to rearrange your matrices, you might try combining this with a bit-interleaved (or "Z-order") ordering of data elements.

Finally, remember: premature optimization is the root of all evil. And when it's not premature any more, always profile & benchmark before, during, and after optimizing....




回答7:


There is an algorithm call the Cannon's algorithm a distributed matrix multiplication algorithm. More here




回答8:


There is no "best algorithm" for all matrices on all modern CPUs.

You will need to do some research into the many methods available, and then find a best-fit solution to the particular problems you are calculating on the particular hardware you are dealing with.

For example, the "fastest" way on your hardware platform may be to use a "slow" algorithm but ask your GPU to apply it to 256 matrices in parallel. Or using a "fast" general-purpose (mxn) algorithm may produce much slower results than using an optimised 3x3 matrix multiply. If you really want it to be fast then you may want to consider getting down to the bare metal to make sure you make best use of specific CPU features like SIMD instructions, branch prediction and cache coherence, at the expense of portability.



来源:https://stackoverflow.com/questions/4455645/what-is-the-best-matrix-multiplication-algorithm

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!