How to write a matrix matrix product that can compete with Eigen?

前端 未结 3 986
伪装坚强ぢ
伪装坚强ぢ 2020-12-24 03:31

Below is the C++ implementation comparing the time taken by Eigen and For Loop to perform matrix-matrix products. The For loop has been optimised to minimise cache misses. T

3条回答
  •  误落风尘
    2020-12-24 04:35

    Your code is already well vectorized by the compiler. The key for higher performance is hierarchical blocking to optimize the usage of registers, and of the different level of caches. Partial loop unrolling is also crucial to improve instruction pipelining. Reaching the performance of Eigen's product require a lot of effort and tuning.

    It should also be noted that your benchmark is slightly biased and not fully reliable. Here is a more reliable version (you need complete Eigen's sources to get bench/BenchTimer.h):

    #include
    #include
    #include
    
    void myprod(double *c, const double* a, const double* b, int N) {
      int count = 0;
      int count1, count2;
      for (int j=0; j(1,10000000/N/N/N);
    
      Eigen::MatrixXd a_E = Eigen::MatrixXd::Random(N,N);
      Eigen::MatrixXd b_E = Eigen::MatrixXd::Random(N,N);
      Eigen::MatrixXd c_E(N,N);
    
      Eigen::BenchTimer t1, t2;
    
      BENCH(t1, tries, rep, c_E.noalias() = a_E*b_E );
      BENCH(t2, tries, rep, myprod(c_E.data(), a_E.data(), b_E.data(), N));
    
      std::cout << "\nTime taken by Eigen is: " << t1.best() << "\n";
      std::cout << "\nTime taken by for-loop is: " << t2.best() << "\n";
    }
    

    Compiling with 3.3-beta1 and FMA enabled (-mfma), then the gap becomes much larger, almost one order of magnitude for N=2000.

提交回复
热议问题