How to write a matrix matrix product that can compete with Eigen?

前端未结

关注

 3  986

伪装坚强ぢ 2020-12-24 03:31

Below is the C++ implementation comparing the time taken by Eigen and For Loop to perform matrix-matrix products. The For loop has been optimised to minimise cache misses. T

3条回答

误落风尘 (楼主)

2020-12-24 04:35
Your code is already well vectorized by the compiler. The key for higher performance is hierarchical blocking to optimize the usage of registers, and of the different level of caches. Partial loop unrolling is also crucial to improve instruction pipelining. Reaching the performance of Eigen's product require a lot of effort and tuning.

It should also be noted that your benchmark is slightly biased and not fully reliable. Here is a more reliable version (you need complete Eigen's sources to get bench/BenchTimer.h):
```
#include
#include
#include

void myprod(double *c, const double* a, const double* b, int N) {
  int count = 0;
  int count1, count2;
  for (int j=0; j(1,10000000/N/N/N);

  Eigen::MatrixXd a_E = Eigen::MatrixXd::Random(N,N);
  Eigen::MatrixXd b_E = Eigen::MatrixXd::Random(N,N);
  Eigen::MatrixXd c_E(N,N);

  Eigen::BenchTimer t1, t2;

  BENCH(t1, tries, rep, c_E.noalias() = a_E*b_E );
  BENCH(t2, tries, rep, myprod(c_E.data(), a_E.data(), b_E.data(), N));

  std::cout << "\nTime taken by Eigen is: " << t1.best() << "\n";
  std::cout << "\nTime taken by for-loop is: " << t2.best() << "\n";
}
```
Compiling with 3.3-beta1 and FMA enabled (-mfma), then the gap becomes much larger, almost one order of magnitude for N=2000.
0 讨论(0)

查看其它3个回答
发布评论:

提交评论
- 加载中...