Optimized matrix multiplication in C

后端 未结 13 2466
一整个雨季
一整个雨季 2020-11-30 01:44

I\'m trying to compare different methods for matrix multiplication. The first one is normal method:

do
{
    for (j = 0; j < i; j++)
    {
        for (k          


        
13条回答
  •  我在风中等你
    2020-11-30 02:27

    You should not write matrix multiplication. You should depend on external libraries. In particular you should use the GEMM routine from the BLAS library. GEMM often provides the following optimizations

    Blocking

    Efficient Matrix Multiplication relies on blocking your matrix and performing several smaller blocked multiplies. Ideally the size of each block is chosen to fit nicely into cache greatly improving performance.

    Tuning

    The ideal block size depends on the underlying memory hierarchy (how big is the cache?). As a result libraries should be tuned and compiled for each specific machine. This is done, among others, by the ATLAS implementation of BLAS.

    Assembly Level Optimization

    Matrix multiplicaiton is so common that developers will optimize it by hand. In particular this is done in GotoBLAS.

    Heterogeneous(GPU) Computing

    Matrix Multiply is very FLOP/compute intensive, making it an ideal candidate to be run on GPUs. cuBLAS and MAGMA are good candidates for this.

    In short, dense linear algebra is a well studied topic. People devote their lives to the improvement of these algorithms. You should use their work; it will make them happy.

提交回复
热议问题