I would like to write a program that makes extensive use of BLAS and LAPACK linear algebra functionalities. Since performance is an issue I did some benchmarking and would l
I've run your benchmark. There is no difference between C++ and numpy on my machine:
Do you think my approach is fair, or are there some unnecessary overheads I can avoid?
It seems fair due to there is no difference in results.
Would you expect that the result would show such a huge discrepancy between the c++ and python approach? Both are using shared objects for their calculations.
No.
Since I would rather use python for my program, what could I do to increase the performance when calling BLAS or LAPACK routines?
Make sure that numpy uses optimized version of BLAS/LAPACK libraries on your system.