Why is boosts matrix multiplication slower than mine?

前端未结

关注

 3  1047

南笙 2020-12-23 13:31

I have implemented one matrix multiplication with boost::numeric::ublas::matrix (see my full, working boost code)

3条回答

旧巷少年郎 (楼主)

2020-12-23 14:07
Slower performance of the uBLAS version can be partly explained by debugging features of the latter as was pointed out by TJD.

Here's the time taken by the uBLAS version with debugging on:
```
real    0m19.966s
user    0m19.809s
sys     0m0.112s
```
Here's the time taken by the uBLAS version with debugging off (-DNDEBUG -DBOOST_UBLAS_NDEBUG compiler flags added):
```
real    0m7.061s
user    0m6.936s
sys     0m0.096s
```
So with debugging off, uBLAS version is almost 3 times faster.

Remaining performance difference can be explained by quoting the following section of uBLAS FAQ "Why is uBLAS so much slower than (atlas-)BLAS":

An important design goal of ublas is to be as general as possible.

This generality almost always comes with a cost. In particular the prod function template can handle different types of matrices, such as sparse or triangular ones. Fortunately uBLAS provides alternatives optimized for dense matrix multiplication, in particular, axpy_prod and block_prod. Here are the results of comparing different methods:
```
ijkalgorithm   prod   axpy_prod  block_prod
   1.335       7.061    1.330       1.278
```
As you can see both axpy_prod and block_prod are somewhat faster than your implementation. Measuring just the computation time without I/O, removing unnecessary copying and careful choice of the block size for block_prod (I used 64) can make the difference more profound.

See also uBLAS FAQ and Effective uBlas and general code optimization.
0 讨论(0)

查看其它3个回答
发布评论:

提交评论
- 加载中...