I have a simple example here to help me understand using numba and cython. I am `new to both numba and cython. I\'ve tried my best with to incorporate all the tricks to make
Add parallelization. In Numba that just involves making the outer loop prange and adding parallel=True to the jit options:
@numba.jit( nopython=True,parallel=True)
def nb_expsum2(x):
nx, ny = x.shape
val = 0.0
for ix in numba.prange(nx):
for iy in range(ny):
val += np.exp( x[ix, iy] )
return val
On my PC that gives a 3.2 times speedup over the non-parallel version. That said on my PC both Numba and Cython beat Numpy as written.
You can also do parallelization in Cython - I haven't tested it here but I'd expect to to be similar to Numba in performance. (Note also that for Cython you can get nx and ny from x.shape[0] and x.shape[1] so you don't have to turn off bounds-checking then rely entirely on user inputs to keep within the bounds).