Is Eigen slow at multiplying small matrices?

后端未结

关注

 2  1932

半阙折子戏 2021-02-06 08:19

I wrote a function that multiplies Eigen matrices of dimension 10x10 together. Then I wrote a naive multiply function CustomMultiply which was surprisingly 2x faste

2条回答

温柔的废话 (楼主)

2021-02-06 09:15

(gdb) bt
#0  0x00005555555679e3 in Eigen::internal::gemm_pack_rhs, 4, 0, false, false>::operator()(double*, Eigen::internal::const_blas_data_mapper const&, long, long, long, long) ()
#1  0x0000555555566654 in Eigen::internal::general_matrix_matrix_product::run(long, long, long, double const*, long, double const*, long, double*, long, double, Eigen::internal::level3_blocking&, Eigen::internal::GemmParallelInfo*) ()
#2  0x0000555555565822 in BM_PairwiseMultiplyEachMatrixNoAlias(benchmark::State&) ()
#3  0x000055555556d571 in benchmark::internal::(anonymous namespace)::RunInThread(benchmark::internal::Benchmark::Instance const*, unsigned long, int, benchmark::internal::ThreadManager*) ()
#4  0x000055555556b469 in benchmark::RunSpecifiedBenchmarks(benchmark::BenchmarkReporter*, benchmark::BenchmarkReporter*) ()
#5  0x000055555556a450 in main ()

From stack trace, eigen's matrix multiplication is using a generic multiply method and loop through a dynamic matrix size. For custom implementation, clang aggressively vectorize it and unroll loop, so there's much less branching.

Maybe there's some flag/option for eigen to generate code for this particular size to optimize.

However, if the matrix size is bigger, the Eigen version will perform much better than custom.

0 讨论(0)

查看其它2个回答