I\'m using the Anaconda distribution of Python, together with Numba, and I\'ve written the following Python function that multiplies a sparse matrix A>
Numba has been updated and prange() works now! (I'm answering my own question.)
The improvements to Numba's parallel computing capabilities are discussed in this blog post, dated December 12, 2017. Here is a relevant snippet from the blog:
Long ago (more than 20 releases!), Numba used to have support for an idiom to write parallel for loops called
prange(). After a major refactoring of the code base in 2014, this feature had to be removed, but it has been one of the most frequently requested Numba features since that time. After the Intel developers parallelized array expressions, they realized that bringing backprangewould be fairly easy
Using Numba version 0.36.1, I can parallelize my embarrassingly parallel for-loop using the following simple code:
@numba.jit(nopython=True, parallel=True)
def csrMult_parallel(x,Adata,Aindices,Aindptr,Ashape):
numRowsA = Ashape[0]
Ax = np.zeros(numRowsA)
for i in numba.prange(numRowsA):
Ax_i = 0.0
for dataIdx in range(Aindptr[i],Aindptr[i+1]):
j = Aindices[dataIdx]
Ax_i += Adata[dataIdx]*x[j]
Ax[i] = Ax_i
return Ax
In my experiments, parallelizing the for-loop made the function execute about eight times faster than the version I posted at the beginning of my question, which was already using Numba, but which was not parallelized. Moreover, in my experiments the parallelized version is about 5x faster than the command Ax = A.dot(x) which uses scipy's sparse matrix-vector multiplication function. Numba has crushed scipy and I finally have a python sparse matrix-vector multiplication routine that is as fast as MATLAB.