I am trying to define a function that contains an inner loop for simulating an integral.
The problem is speed. Evaluating the function once can take up to 30 seconds on
Cython doesn't offer automatic performance gains, you have to know its internals and check the generated C code.
In particular if you want to improve loops performances, you have to avoid calling Python functions in them, which you happen to do a lot in this case (all the np.
calls are Python calls, slicing, and probably other things).
See this page for general guidelines about performance optimization with Cython (the -a switch really is handy when optimizing) and this one for specificities when optimizing numpy code.