What is the most efficient way to transpose a matrix in CUDA?

后端 未结 3 1770
予麋鹿
予麋鹿 2021-01-06 16:27

I have a M*N host memory matrix, and upon copying into a device memory, I need it to be transposed into a N*M matrix. Is there any cuda (cuBLAS...)

3条回答
  •  萌比男神i
    2021-01-06 17:16

    CULA has auxiliary routines to compute the transpose (culaDevice?geTranspose). In case of a square matrix you could also use inplace transposition (culaDevise?geTransposeInplace).

    Note: CULA has a free license available, if you meet certain conditions.

提交回复
热议问题