I found the bottleneck in my python code, played around with psycho etc. Then decided to write a c/c++ extension for performance.
With the help of swig you almost do
There is an article worth reading on the topic Cython, pybind11, cffi – which tool should you choose?
Quick recap for the impatient:
Cython compiles your python to C/C++ allowing you to embed your C/C++ into python code. Uses static binding. For python programmers.
pybind11 (and boost.python) is the opposite. Bind your stuff at compile time from the C++ side. For C++ programmers.
CFFI allows you to bind the native stuff dynamically at runtime. Simple to use, but higher performance penalty.