Multithreaded sparse matrix multiplication in Matlab

后端 未结 3 1793
故里飘歌
故里飘歌 2021-01-11 21:21

I am performing several matrix multiplications of an NxN sparse (~1-2%) matrix, let\'s call it B, with an NxM dense matrix, let\'s call it A (where M < N). N is large, as

相关标签:
3条回答
  • 2021-01-11 21:48

    On matlab central the same question was asked, and this answer was given:

    I believe the sparse matrix code is implemented by a few specialized TMW engineers rather than an external library like BLAS/LAPACK/LINPACK/etc... 
    

    Which basically means, that you are out of luck.


    However I can think of some tricks to achieve faster computations:

    1. If you need to do several multiplications: do multiple multiplications at once and process them in parallel?
    2. If you just want to do one multiplication: Cut the matrix into pieces (for example top half and bottom half), do the calculations of the parts in parallel and combine the results afterwards

    Probably these solutions will not turn out to be as fast as properly implemented multithreading, but hopefully you can still get a speedup.

    0 讨论(0)
  • 2021-01-11 21:52

    MATLAB already uses SuiteSparse by Tim Davis for many of its operation on sparse matrices (for example see here), but neither of which I believe are multithreaded.

    Usually computations on sparse matrices are memory-bound rather than CPU-bound. So even you use a multithreaded library, I doubt you will see huge benefits in terms of performance, at least not comparable to those specialized in dense matrices...

    After all that the design of sparse matrices have different goals in mind than regular dense matrices, where efficient memory storage is often more important.


    I did a quick search online, and found a few implementations out there:

    • sparse BLAS, spBLAS, PSBLAS. For instance, Intel MKL and AMD ACML do have some support for sparse matrices
    • cuSPARSE, CUSP, VexCL, ViennaCL, etc.. that run on the GPU.
    0 讨论(0)
  • 2021-01-11 21:58

    I ended up writing my own mex file with OpenMP for multithreading. Code as follows. Don't forget to use -largeArrayDims and /openmp (or -fopenmp) flags when compiling.

    #include <omp.h>
    #include "mex.h"
    #include "matrix.h"
    
    #define ll long long
    
    void omp_smm(double* A, double*B, double* C, ll m, ll p, ll n, ll* irs, ll* jcs)
    {
        for (ll j=0; j<p; ++j)
        {
            ll istart = jcs[j];
            ll iend = jcs[j+1];
            #pragma omp parallel for
            for (ll ii=istart; ii<iend; ++ii)
            {
                ll i = irs[ii];
                double aa = A[ii];
                for (ll k=0; k<n; ++k)
                {
                    C[i+k*m] += B[j+k*p]*aa;
                }
            }
        }
    }
    
    
    void mexFunction(int nlhs, mxArray *plhs[], int nrhs, const mxArray *prhs[])
    {
        double *A, *B, *C; /* pointers to input & output matrices*/
        size_t m,n,p;      /* matrix dimensions */
    
        A = mxGetPr(prhs[0]); /* first sparse matrix */
        B = mxGetPr(prhs[1]); /* second full matrix */
    
        mwIndex * irs = mxGetIr(prhs[0]);
        mwIndex * jcs = mxGetJc(prhs[0]);
    
        m = mxGetM(prhs[0]);  
        p = mxGetN(prhs[0]);
        n = mxGetN(prhs[1]);
    
        /* create output matrix C */
        plhs[0] = mxCreateDoubleMatrix(m, n, mxREAL);
        C = mxGetPr(plhs[0]);
    
        omp_smm(A,B,C, m, p, n, (ll*)irs, (ll*)jcs);
    }
    
    0 讨论(0)
提交回复
热议问题