I found the bottleneck in my python code, played around with psycho etc. Then decided to write a c/c++ extension for performance.
With the help of swig you almost do
SWIG 2.0.4 has introduced a new -builtin option that improves performance. I did some benchmarking using an example program that does a lot of fast calls to a C++ extension. I built the extension using boost.python, PyBindGen, SIP and SWIG with and without the -builtin option. Here are the results (average of 100 runs):
SWIG with -builtin 2.67s
SIP 2.70s
PyBindGen 2.74s
boost.python 3.07s
SWIG without -builtin 4.65s
SWIG used to be slowest. With the new -builtin option, SWIG seems to be fastest.