Comparing Python, Numpy, Numba and C++ for matrix multiplication

≡放荡痞女 提交于 2019-11-29 05:41:21
zmbq

Definitely use -O3 for optimization. This turns vectorizations on, which should significantly speed your code up.

Numba is supposed to do that already.

What I would recommend

If you want maximum efficiency, you should use a dedicated linear algebra library, the classic of which is BLAS/LAPACK libraries. There are a number of implementations, eg. Intel MKL. What you write is NOT going to outpeform hyper-optimized libraries.

Matrix matrix multiply is going to be the dgemm routine: d stands for double, ge for general, and mm for matrix matrix multiply. If your problem has additional structure, a more specific function may be called for additional speedup.

Note that Numpy dot ALREADY calls dgemm! You're probably not going to do better.

Why your c++ is slow

Your classic, intuitive algorithm for matrix-matrix multiplication turns out to be slow compared to what's possible. Writing code that takes advantage of how processors cache etc... yields important performance gains. The point is, tons of smart people have devoted their lives to making matrix matrix multiply extremely fast, and you should use their work and not reinvent the wheel.

In your current implementation most likely compiler is unable to auto vectorize the most inner loop because its size is 3. Also m2 is accessed in a "jumpy" way. Swapping loops so that iterating over p is in the most inner loop will make it work faster (col will not make "jumpy" data access) and compiler should be able to do better job (autovectorize).

for (int row = 0; row < m; row++) {
    for (int k = 0; k < n; k++) {
        for (int col = 0; col < p; col++) {
            m3.data_[p*row + col] += m1.data_[n*row + k] * m2.data_[p*k + col];
        }
    }
}

On my machine the original C++ implementation for p=10^6 elements build with g++ dot.cpp -std=c++11 -O3 -o dot flags takes 12ms and above implementation with swapped loops takes 7ms.

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!