A Cache Efficient Matrix Transpose Program?

前端 未结 6 654
無奈伤痛
無奈伤痛 2020-11-30 20:19

So the obvious way to transpose a matrix is to use :

  for( int i = 0; i < n; i++ )

    for( int j = 0; j < n; j++ )

      destination[j+i*n] = sourc         


        
6条回答
  •  余生分开走
    2020-11-30 20:37

    Matrix multiplication comes to mind, but the cache issue there is much more pronounced, because each element is read N times.

    With matrix transpose, you are reading in a single linear pass and there's no way to optimize that. But you can simultaneously process several rows so that you write several columns and so fill complete cache lines. You will only need three loops.

    Or do it the other way around and read in columns while writing linearly.

提交回复
热议问题