How to parallelize this Python for loop when using Numba
I'm using the Anaconda distribution of Python, together with Numba, and I've written the following Python function that multiplies a sparse matrix A (stored in a CSR format) by a dense vector x : @jit def csrMult( x, Adata, Aindices, Aindptr, Ashape ): numRowsA = Ashape[0] Ax = numpy.zeros( numRowsA ) for i in range( numRowsA ): Ax_i = 0.0 for dataIdx in range( Aindptr[i], Aindptr[i+1] ): j = Aindices[dataIdx] Ax_i += Adata[dataIdx] * x[j] Ax[i] = Ax_i return Ax Here A is a large scipy sparse matrix, >>> A.shape ( 56469, 39279 ) # having ~ 142,258,302 nonzero entries (so about 6.4% ) >>> type(