A Cache Efficient Matrix Transpose Program?

前端 未结 6 640
無奈伤痛
無奈伤痛 2020-11-30 20:19

So the obvious way to transpose a matrix is to use :

  for( int i = 0; i < n; i++ )

    for( int j = 0; j < n; j++ )

      destination[j+i*n] = sourc         


        
6条回答
  •  谎友^
    谎友^ (楼主)
    2020-11-30 20:48

    I had the exact same problem yesterday. I ended up with this solution:

    void transpose(double *dst, const double *src, size_t n, size_t p) noexcept {
        THROWS();
        size_t block = 32;
        for (size_t i = 0; i < n; i += block) {
            for(size_t j = 0; j < p; ++j) {
                for(size_t b = 0; b < block && i + b < n; ++b) {
                    dst[j*n + i + b] = src[(i + b)*p + j];
                }
            }
        }
    }
    

    This is 4 time faster than the obvious solution on my machine.

    This solution takes care of a rectangular matrix with dimensions which are not a multiple of the block size.

    if dst and src are the same square matrix an in place function should really be used instead:

    void transpose(double*m,size_t n)noexcept{
        size_t block=0,size=8;
        for(block=0;block+size-1

    I used C++11 but this could be easily translated in other languages.

提交回复
热议问题