I\'m trying to compare different methods for matrix multiplication. The first one is normal method:
do
{
for (j = 0; j < i; j++)
{
for (k
You should not write matrix multiplication. You should depend on external libraries. In particular you should use the GEMM routine from the BLAS library. GEMM often provides the following optimizations
Blocking
Efficient Matrix Multiplication relies on blocking your matrix and performing several smaller blocked multiplies. Ideally the size of each block is chosen to fit nicely into cache greatly improving performance.
Tuning
The ideal block size depends on the underlying memory hierarchy (how big is the cache?). As a result libraries should be tuned and compiled for each specific machine. This is done, among others, by the ATLAS implementation of BLAS.
Assembly Level Optimization
Matrix multiplicaiton is so common that developers will optimize it by hand. In particular this is done in GotoBLAS.
Heterogeneous(GPU) Computing
Matrix Multiply is very FLOP/compute intensive, making it an ideal candidate to be run on GPUs. cuBLAS and MAGMA are good candidates for this.
In short, dense linear algebra is a well studied topic. People devote their lives to the improvement of these algorithms. You should use their work; it will make them happy.