Calculating matrix product is much slower with SSE than with straight-forward-algorithm
问题 I want to multiply two matrices, one time by using the straight-forward-algorithm: template <typename T> void multiplicate_straight(T ** A, T ** B, T ** C, int sizeX) { T ** D = AllocateDynamicArray2D<T>(sizeX, sizeX); transpose_matrix(B, D,sizeX); for(int i = 0; i < sizeX; i++) { for(int j = 0; j < sizeX; j++) { for(int g = 0; g < sizeX; g++) { C[i][j] += A[i][g]*D[j][g]; } } } FreeDynamicArray2D<T>(D); } and one time via using SSE functions. For this I created two functions: template