How to get faster code than numpy.dot for matrix multiplication?
Here Matrix multiplication using hdf5 I use hdf5 (pytables) for big matrix multiplication, but I was suprised because using hdf5 it works even faster then using plain numpy.dot and store matrices in RAM, what is the reason of this behavior? And maybe there is some faster function for matrix multiplication in python, because I still use numpy.dot for small block matrix multiplication. here is some code: Assume matrices can fit in RAM: test on matrix 10*1000 x 1000. Using default numpy(I think no BLAS lib). Plain numpy arrays are in RAM: time 9.48 If A,B in RAM, C on disk: time 1.48 If A,B,C on