How to optimize matrix multiplication operation [duplicate]

帅比萌擦擦* 提交于 2019-11-29 00:09:53

Eigen is by far one of the fastest, if not the fastest, linear algebra libraries out there. It is well written and it is of high quality. Also, it uses expression template which makes writing code that is more readable. Version 3 just released uses OpenMP for data parallelism.

#include <iostream>
#include <Eigen/Dense>

using Eigen::MatrixXd;

int main()
{
  MatrixXd m(2,2);
  m(0,0) = 3;
  m(1,0) = 2.5;
  m(0,1) = -1;
  m(1,1) = m(1,0) + m(0,1);
  std::cout << m << std::endl;
}

Boost uBLAS I think is definitely the way to go with this sort of thing. Boost is well designed, well tested and used in a lot of applications.

Consider GNU Scientific Library, or MV++

If you're okay with C, BLAS is a low-level library that incorporates both C and C-wrapped FORTRAN instructions and is used a huge number of higher-level math libraries.

I don't know anything about this, but another option might be Meschach which seems to have decent performance.

Edit: With respect to your comment about not wanting to use libraries that use your graphics card, I'll point out that in many cases, the libraries that use your graphics card are specialized implementations of standard (non-GPU) libraries. For example, various implementations of BLAS are listed on it's Wikipedia page, only some are designed to leverage your GPU.

There is a book called Introduction to Algorithms. You may like to check the chapter of Dynamic Programming. It has an excellent matrix multiplication algo using dynamic programming. Its worth a read. Well, this info was in case you want to write your own logic instead of using a library.

There are plenty of algorithms for efficient matrix multiplication.

Algorithms for efficient matrix multiplication

Look at the algorithms, find an implementations.

You can also make a multi-threaded implementation for it.

What I'd do is reduce the number of at(i) operators being called. For instance in this loop:

for (int i=0;i<this->rows;i++)     
{        
    for (int j=0;j<matrix.GetColumns();j++)  
    {          
         multipliedMatrix.datavector.at(i).at(j) = 0;     
         for (int k=0;k<this->columns ;k++)          
         {               
               multipliedMatrix.datavector.at(i).at(j) +=  datavector.at(i).at(k) * matrix.datavector.at(k).at(j);            
         } 
     }
 } 

You're wasting a lot of time by performing the at(i) operator inside every j and every k loop.

What I'd do instead is:

for (int i=0;i<this->rows;i++)     
{   
    // I don't know the type of this object, but let's call it type MatrixRow     
    MatrixRow & mmi = multipliedMatrix.datavector.at(i);
    MatrixRow & dvi = datavector.at(i);
    for (int j=0;j<matrix.GetColumns();j++)  
    {          
         // I don't know the type of this either, but let's say it's a double
         double &mmij  = mmi.at(j);
         mmij = 0;
         for (int k=0;k<this->columns ;k++)          
         {               
               mmij +=  dvi.at(k) * matrix.datavector.at(k).at(j);            
         } 
     }
 } 

The above suggestions might not be syntatically correct, but you get the idea.

Also, if your memory is contiguous allocated, you can get even further speedups, by not doing lookups for each j and each k, but instead using the appropriate pointer increments.

Also, the array boundaries might be inefficient since these lookups are being called a lot and each time a function is being called or a dereference is being done. That is this->rows, matrix.GetColumns(), and this->columns could be stored in appropriate integers. This might improve speed a lot.

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!