Optimized matrix multiplication in C

后端 未结 13 2398
一整个雨季
一整个雨季 2020-11-30 01:44

I\'m trying to compare different methods for matrix multiplication. The first one is normal method:

do
{
    for (j = 0; j < i; j++)
    {
        for (k          


        
13条回答
  •  时光取名叫无心
    2020-11-30 02:39

    Generally speaking, transposing B should end up being much faster than the naive implementation, but at the expense of wasting another NxN worth of memory. I just spent a week digging around matrix multiplication optimization, and so far the absolute hands-down winner is this:

    for (int i = 0; i < N; i++)
        for (int k = 0; k < N; k++)
            for (int j = 0; j < N; j++)
                if (likely(k)) /* #define likely(x) __builtin_expect(!!(x), 1) */
                    C[i][j] += A[i][k] * B[k][j];
                else
                    C[i][j] = A[i][k] * B[k][j];
    

    This is even better than Drepper's method mentioned in an earlier comment, as it works optimally regardless of the cache properties of the underlying CPU. The trick lies in reordering the loops so that all three matrices are accessed in row-major order.

提交回复
热议问题