I have a simple example here to help me understand using numba and cython. I am `new to both numba and cython. I\'ve tried my best with to incorporate all the tricks to make
Add parallelization. In Numba that just involves making the outer loop prange
and adding parallel=True
to the jit
options:
@numba.jit( nopython=True,parallel=True)
def nb_expsum2(x):
nx, ny = x.shape
val = 0.0
for ix in numba.prange(nx):
for iy in range(ny):
val += np.exp( x[ix, iy] )
return val
On my PC that gives a 3.2 times speedup over the non-parallel version. That said on my PC both Numba and Cython beat Numpy as written.
You can also do parallelization in Cython - I haven't tested it here but I'd expect to to be similar to Numba in performance. (Note also that for Cython you can get nx
and ny
from x.shape[0]
and x.shape[1]
so you don't have to turn off bounds-checking then rely entirely on user inputs to keep within the bounds).